[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wangyum closed the pull request at: https://github.com/apache/spark/pull/18323 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r130526899 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -1186,3 +1186,124 @@ case class BRound(child: Expression, scale: Expression) with Serializable with ImplicitCastInputTypes { def this(child: Expression) = this(child, Literal(0)) } + +/** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr is the expression for which the histogram is being created --- End diff -- Comments copy from https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2137.htm#OLADM717 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r130526538 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -1186,3 +1186,124 @@ case class BRound(child: Expression, scale: Expression) with Serializable with ImplicitCastInputTypes { def this(child: Expression) = this(child, Literal(0)) } + +/** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr is the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an expression that resolves to + * a constant indicating the number of buckets + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns the `bucket` to which operand would be assigned in an equidepth histogram with `num_bucket` buckets, in the range `min_value` to `max_value`.", + extended = """ +Examples: + > SELECT _FUNC_(5.35, 0.024, 10.06, 5); + 3 + """) +// scalastyle:on line.size.limit +case class WidthBucket( + expr: Expression, + minValue: Expression, + maxValue: Expression, + numBucket: Expression) extends QuaternaryExpression with ImplicitCastInputTypes { + + override def children: Seq[Expression] = Seq(expr, minValue, maxValue, numBucket) + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType, DoubleType, LongType) + override def dataType: DataType = LongType + override def nullable: Boolean = true + + private val isFoldable = minValue.foldable && maxValue.foldable && numBucket.foldable + + private lazy val _minValue: Any = minValue.eval(EmptyRow) + private lazy val minValueV = _minValue.asInstanceOf[Double] + + private lazy val _maxValue: Any = maxValue.eval(EmptyRow) + private lazy val maxValueV = _maxValue.asInstanceOf[Double] + + private lazy val _numBucket: Any = numBucket.eval(EmptyRow) + private lazy val numBucketV = _numBucket.asInstanceOf[Long] + + private val errMsg = "The argument [%d] of WIDTH_BUCKET function is NULL or invalid." + + override def eval(input: InternalRow): Any = { + +if (isFoldable) { + if (_minValue == null) { +throw new RuntimeException(errMsg.format(2)) + } else if (_maxValue == null) { +throw new RuntimeException(errMsg.format(3)) + } else if (_numBucket == null || numBucketV <= 0) { +throw new RuntimeException(errMsg.format(4)) --- End diff -- To be consistent with oracle: ![oracle-width_bucket](https://user-images.githubusercontent.com/5399861/28811683-65c7c5a8-76c3-11e7-990b-4e2ca41eca6e.jpeg) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r130038008 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -1186,3 +1186,124 @@ case class BRound(child: Expression, scale: Expression) with Serializable with ImplicitCastInputTypes { def this(child: Expression) = this(child, Literal(0)) } + +/** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr is the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an expression that resolves to + * a constant indicating the number of buckets + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns the `bucket` to which operand would be assigned in an equidepth histogram with `num_bucket` buckets, in the range `min_value` to `max_value`.", + extended = """ +Examples: + > SELECT _FUNC_(5.35, 0.024, 10.06, 5); + 3 + """) +// scalastyle:on line.size.limit +case class WidthBucket( + expr: Expression, + minValue: Expression, + maxValue: Expression, + numBucket: Expression) extends QuaternaryExpression with ImplicitCastInputTypes { + + override def children: Seq[Expression] = Seq(expr, minValue, maxValue, numBucket) + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType, DoubleType, LongType) + override def dataType: DataType = LongType + override def nullable: Boolean = true + + private val isFoldable = minValue.foldable && maxValue.foldable && numBucket.foldable + + private lazy val _minValue: Any = minValue.eval(EmptyRow) --- End diff -- minValue.eval()? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r130037738 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -1186,3 +1186,124 @@ case class BRound(child: Expression, scale: Expression) with Serializable with ImplicitCastInputTypes { def this(child: Expression) = this(child, Literal(0)) } + +/** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr is the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an expression that resolves to + * a constant indicating the number of buckets + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns the `bucket` to which operand would be assigned in an equidepth histogram with `num_bucket` buckets, in the range `min_value` to `max_value`.", + extended = """ +Examples: + > SELECT _FUNC_(5.35, 0.024, 10.06, 5); + 3 + """) +// scalastyle:on line.size.limit +case class WidthBucket( + expr: Expression, + minValue: Expression, + maxValue: Expression, + numBucket: Expression) extends QuaternaryExpression with ImplicitCastInputTypes { + + override def children: Seq[Expression] = Seq(expr, minValue, maxValue, numBucket) + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType, DoubleType, LongType) + override def dataType: DataType = LongType + override def nullable: Boolean = true + + private val isFoldable = minValue.foldable && maxValue.foldable && numBucket.foldable + + private lazy val _minValue: Any = minValue.eval(EmptyRow) + private lazy val minValueV = _minValue.asInstanceOf[Double] + + private lazy val _maxValue: Any = maxValue.eval(EmptyRow) + private lazy val maxValueV = _maxValue.asInstanceOf[Double] + + private lazy val _numBucket: Any = numBucket.eval(EmptyRow) + private lazy val numBucketV = _numBucket.asInstanceOf[Long] + + private val errMsg = "The argument [%d] of WIDTH_BUCKET function is NULL or invalid." + + override def eval(input: InternalRow): Any = { + +if (isFoldable) { + if (_minValue == null) { +throw new RuntimeException(errMsg.format(2)) + } else if (_maxValue == null) { +throw new RuntimeException(errMsg.format(3)) + } else if (_numBucket == null || numBucketV <= 0) { +throw new RuntimeException(errMsg.format(4)) --- End diff -- Is the above handling for null valid? Except for null-tolerant expressions, usually any input of an expression is null, it returns null. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r130036804 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -631,3 +632,109 @@ abstract class TernaryExpression extends Expression { } } } + +/** + * An expression with four inputs and one output. The output is by default evaluated to null + * if any input is evaluated to null. + */ +abstract class QuaternaryExpression extends Expression { --- End diff -- We can remove this based on your current code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r130036608 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -1186,3 +1186,124 @@ case class BRound(child: Expression, scale: Expression) with Serializable with ImplicitCastInputTypes { def this(child: Expression) = this(child, Literal(0)) } + +/** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr is the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an expression that resolves to + * a constant indicating the number of buckets + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns the `bucket` to which operand would be assigned in an equidepth histogram with `num_bucket` buckets, in the range `min_value` to `max_value`.", + extended = """ +Examples: + > SELECT _FUNC_(5.35, 0.024, 10.06, 5); + 3 + """) +// scalastyle:on line.size.limit +case class WidthBucket( + expr: Expression, + minValue: Expression, + maxValue: Expression, + numBucket: Expression) extends QuaternaryExpression with ImplicitCastInputTypes { + + override def children: Seq[Expression] = Seq(expr, minValue, maxValue, numBucket) + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType, DoubleType, LongType) + override def dataType: DataType = LongType + override def nullable: Boolean = true + + private val isFoldable = minValue.foldable && maxValue.foldable && numBucket.foldable + + private lazy val _minValue: Any = minValue.eval(EmptyRow) + private lazy val minValueV = _minValue.asInstanceOf[Double] + + private lazy val _maxValue: Any = maxValue.eval(EmptyRow) + private lazy val maxValueV = _maxValue.asInstanceOf[Double] + + private lazy val _numBucket: Any = numBucket.eval(EmptyRow) + private lazy val numBucketV = _numBucket.asInstanceOf[Long] + + private val errMsg = "The argument [%d] of WIDTH_BUCKET function is NULL or invalid." + + override def eval(input: InternalRow): Any = { + +if (isFoldable) { + if (_minValue == null) { +throw new RuntimeException(errMsg.format(2)) + } else if (_maxValue == null) { +throw new RuntimeException(errMsg.format(3)) + } else if (_numBucket == null || numBucketV <= 0) { +throw new RuntimeException(errMsg.format(4)) + } else { +val exprV = expr.eval(input) +if (exprV == null) { + null +} else { + MathUtils.widthBucket(exprV.asInstanceOf[Double], minValueV, maxValueV, numBucketV) +} + } +} else { + val evals = children.map(_.eval(input)) + val invalid = evals.zipWithIndex.filter { case (e, i) => +(i > 0 && e == null) || (i == 3 && e.asInstanceOf[Long] <= 0) + } + if (invalid.nonEmpty) { +invalid.foreach(l => throw new RuntimeException(errMsg.format(l._2 + 1))) + } else if (evals(0) == null) { +null + } else { +MathUtils.widthBucket( + evals(0).asInstanceOf[Double], + evals(1).asInstanceOf[Double], + evals(2).asInstanceOf[Double], + evals(3).asInstanceOf[Long]) + } +} + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { --- End diff -- ditto. As you override `doGenCode`, I don't see the reason why you create `QuaternaryExpression` for this expr. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r130036586 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -1186,3 +1186,124 @@ case class BRound(child: Expression, scale: Expression) with Serializable with ImplicitCastInputTypes { def this(child: Expression) = this(child, Literal(0)) } + +/** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr is the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an expression that resolves to + * a constant indicating the number of buckets + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns the `bucket` to which operand would be assigned in an equidepth histogram with `num_bucket` buckets, in the range `min_value` to `max_value`.", + extended = """ +Examples: + > SELECT _FUNC_(5.35, 0.024, 10.06, 5); + 3 + """) +// scalastyle:on line.size.limit +case class WidthBucket( + expr: Expression, + minValue: Expression, + maxValue: Expression, + numBucket: Expression) extends QuaternaryExpression with ImplicitCastInputTypes { + + override def children: Seq[Expression] = Seq(expr, minValue, maxValue, numBucket) + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType, DoubleType, LongType) + override def dataType: DataType = LongType + override def nullable: Boolean = true + + private val isFoldable = minValue.foldable && maxValue.foldable && numBucket.foldable + + private lazy val _minValue: Any = minValue.eval(EmptyRow) + private lazy val minValueV = _minValue.asInstanceOf[Double] + + private lazy val _maxValue: Any = maxValue.eval(EmptyRow) + private lazy val maxValueV = _maxValue.asInstanceOf[Double] + + private lazy val _numBucket: Any = numBucket.eval(EmptyRow) + private lazy val numBucketV = _numBucket.asInstanceOf[Long] + + private val errMsg = "The argument [%d] of WIDTH_BUCKET function is NULL or invalid." + + override def eval(input: InternalRow): Any = { --- End diff -- Why you create `QuaternaryExpression` with default `nullSafeEval` but didn't use it? If you want do so, you don't need `QuaternaryExpression`, simply let `WidthBucket` extend `Expression`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r130035216 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -1186,3 +1186,124 @@ case class BRound(child: Expression, scale: Expression) with Serializable with ImplicitCastInputTypes { def this(child: Expression) = this(child, Literal(0)) } + +/** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr is the expression for which the histogram is being created --- End diff -- is this comment correct? How about "the expression for which the bucket number in the histogram would return"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r130021930 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -1219,44 +1219,91 @@ case class WidthBucket( override def dataType: DataType = LongType override def nullable: Boolean = true + private val isFoldable = minValue.foldable && maxValue.foldable && numBucket.foldable + + private lazy val _minValue: Any = minValue.eval(EmptyRow) + private lazy val minValueV = _minValue.asInstanceOf[Double] --- End diff -- if `minValue.eval(EmptyRow) == null`, `minValue.eval(EmptyRow).asInstanceOf[Double]` will be `0.0`, So keep both of them here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r128569724 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -1186,3 +1186,77 @@ case class BRound(child: Expression, scale: Expression) with Serializable with ImplicitCastInputTypes { def this(child: Expression) = this(child, Literal(0)) } + +/** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr is the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an expression that resolves to + * a constant indicating the number of buckets + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns the `bucket` to which operand would be assigned in an equidepth histogram with `num_bucket` buckets, in the range `min_value` to `max_value`.", + extended = """ +Examples: + > SELECT _FUNC_(5.35, 0.024, 10.06, 5); + 3 + """) +// scalastyle:on line.size.limit +case class WidthBucket( + expr: Expression, + minValue: Expression, + maxValue: Expression, + numBucket: Expression) extends QuaternaryExpression with ImplicitCastInputTypes { + + override def children: Seq[Expression] = Seq(expr, minValue, maxValue, numBucket) + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType, DoubleType, LongType) + override def dataType: DataType = LongType + override def nullable: Boolean = true + + private val errMsg = "The argument [%d] of WIDTH_BUCKET function is NULL or invalid." + + override def eval(input: InternalRow): Any = { + +children.map(_.eval(input)).zipWithIndex.foreach { case (e, i) => + if ((i > 0 && null == e) || (i == 3 && e.asInstanceOf[Long] < 0)) { +throw new RuntimeException(errMsg.format(i + 1)) --- End diff -- Based on my understanding, the last three parameters are foldable. Thus, we are able to evaluate them during the plan analysis, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r128482157 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr is the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an expression that resolves to + * a constant indicating the number of buckets + * @return Returns an long between 0 and numBucket+1 by mapping the expr into buckets defined by + * the range [minValue, maxValue]. + */ + def widthBucket(expr: Double, minValue: Double, maxValue: Double, numBucket: Long): Long = { + +if (numBucket <= 0) { + throw new AnalysisException(s"The num of bucket must be greater than 0, but got ${numBucket}") +} + +val lower: Double = Math.min(minValue, maxValue) +val upper: Double = Math.max(minValue, maxValue) + +val result: Long = if (expr < lower) { --- End diff -- Added 2 test case: https://github.com/apache/spark/commit/099db6d32583dce473757a9ffc715ff4c4b44265#diff-96c9a63f4a212cd144bc18cf7063b649R670 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r128482031 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr is the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an expression that resolves to + * a constant indicating the number of buckets + * @return Returns an long between 0 and numBucket+1 by mapping the expr into buckets defined by + * the range [minValue, maxValue]. + */ + def widthBucket(expr: Double, minValue: Double, maxValue: Double, numBucket: Long): Long = { + +if (numBucket <= 0) { + throw new AnalysisException(s"The num of bucket must be greater than 0, but got ${numBucket}") +} + +val lower: Double = Math.min(minValue, maxValue) +val upper: Double = Math.max(minValue, maxValue) + +val result: Long = if (expr < lower) { + 0 +} else if (expr >= upper) { + numBucket + 1L --- End diff -- Added 2 test case: https://github.com/apache/spark/pull/18323/commits/099db6d32583dce473757a9ffc715ff4c4b44265#diff-96c9a63f4a212cd144bc18cf7063b649R674 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r127614336 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr is the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an expression that resolves to + * a constant indicating the number of buckets + * @return Returns an long between 0 and numBucket+1 by mapping the expr into buckets defined by + * the range [minValue, maxValue]. + */ + def widthBucket(expr: Double, minValue: Double, maxValue: Double, numBucket: Long): Long = { + +if (numBucket <= 0) { + throw new AnalysisException(s"The num of bucket must be greater than 0, but got ${numBucket}") +} + +val lower: Double = Math.min(minValue, maxValue) +val upper: Double = Math.max(minValue, maxValue) + +val result: Long = if (expr < lower) { + 0 +} else if (expr >= upper) { + numBucket + 1L --- End diff -- // an overflow bucket numbered num_buckets+1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r127614328 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr is the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an expression that resolves to + * a constant indicating the number of buckets + * @return Returns an long between 0 and numBucket+1 by mapping the expr into buckets defined by + * the range [minValue, maxValue]. + */ + def widthBucket(expr: Double, minValue: Double, maxValue: Double, numBucket: Long): Long = { + +if (numBucket <= 0) { + throw new AnalysisException(s"The num of bucket must be greater than 0, but got ${numBucket}") +} + +val lower: Double = Math.min(minValue, maxValue) +val upper: Double = Math.max(minValue, maxValue) + +val result: Long = if (expr < lower) { --- End diff -- // Creates (when needed) an underflow bucket numbered 0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r127614121 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -1186,3 +1186,51 @@ case class BRound(child: Expression, scale: Expression) with Serializable with ImplicitCastInputTypes { def this(child: Expression) = this(child, Literal(0)) } + +/** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr is the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an expression that resolves to + * a constant indicating the number of buckets + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns an long between 0 and `num_buckets`+1 by mapping the `expr` into buckets defined by the range [`min_value`, `max_value`].", + extended = """ +Examples: + > SELECT _FUNC_(5.35, 0.024, 10.06, 5); + 3 + """) +// scalastyle:on line.size.limit +case class WidthBucket( + expr: Expression, + minValue: Expression, + maxValue: Expression, + numBucket: Expression) extends QuaternaryExpression with ImplicitCastInputTypes { + + override def children: Seq[Expression] = Seq(expr, minValue, maxValue, numBucket) + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType, DoubleType, LongType) + override def dataType: DataType = LongType + override def nullable: Boolean = true + + override def nullSafeEval(ex: Any, min: Any, max: Any, num: Any): Any = { --- End diff -- We should not use nullSafeEval. See the answers I got from Oracle. ``` select width_bucket(col1, 0, 10, -9) from t; ORA-30494: The argument [4] of WIDTH_BUCKET function is NULL or invalid. select width_bucket(col1, 0, 10, null) from t; ORA-30494: The argument [4] of WIDTH_BUCKET function is NULL or invalid. select width_bucket(col1, null, 5, 9) from t; ORA-30494: The argument [2] of WIDTH_BUCKET function is NULL or invalid. select width_bucket(col1, 5, null, 9) from t; ORA-30494: The argument [3] of WIDTH_BUCKET function is NULL or invalid. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r127613839 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala --- @@ -644,4 +645,37 @@ class MathExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation(BRound(-0.35, 1), -0.4) checkEvaluation(BRound(-35, -1), -40) } + + test("width_bucket") { +def test( + expr: Double, + minValue: Double, + maxValue: Double, + numBucket: Long, + expected: Long): Unit = { + checkEvaluation(WidthBucket(Literal.create(expr, DoubleType), +Literal.create(minValue, DoubleType), +Literal.create(maxValue, DoubleType), +Literal.create(numBucket, LongType)), +expected) +} + +test(5.35, 0.024, 10.06, 5, 3) + +test(3.14, 0, 4, 3, 3) +test(2, 0, 4, 3, 2) +test(-1, 0, 3.2, 4, 0) + +test(3.14, 4, 0, 3, 1) +test(2, 4, 0, 3, 2) +test(-1, 3.2, 0, 4, 5) + +// numBucket <= 0 +intercept[AnalysisException]{ --- End diff -- Please capture the error message and verify it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r127613829 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr is the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an expression that resolves to + * a constant indicating the number of buckets + * @return Returns an long between 0 and numBucket+1 by mapping the expr into buckets defined by + * the range [minValue, maxValue]. + */ + def widthBucket(expr: Double, minValue: Double, maxValue: Double, numBucket: Long): Long = { + +if (numBucket <= 0) { + throw new AnalysisException(s"The num of bucket must be greater than 0, but got ${numBucket}") --- End diff -- This check needs to be moved to `case class WidthBucket`. We do not want to issue such an exception during the execution of the query. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r127613753 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -1186,3 +1186,51 @@ case class BRound(child: Expression, scale: Expression) with Serializable with ImplicitCastInputTypes { def this(child: Expression) = this(child, Literal(0)) } + +/** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr is the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an expression that resolves to + * a constant indicating the number of buckets + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns an long between 0 and `num_buckets`+1 by mapping the `expr` into buckets defined by the range [`min_value`, `max_value`].", --- End diff -- > Return the `bucket` to which operand would be assigned in an equidepth histogram with `num_bucket` buckets, in the range `min_value` to `max_value`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r127613668 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -1186,3 +1186,51 @@ case class BRound(child: Expression, scale: Expression) with Serializable with ImplicitCastInputTypes { def this(child: Expression) = this(child, Literal(0)) } + +/** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr is the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an expression that resolves to + * a constant indicating the number of buckets + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns an long between 0 and `num_buckets`+1 by mapping the `expr` into buckets defined by the range [`min_value`, `max_value`].", --- End diff -- Nit: `an` -> `a` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r127601368 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -1186,3 +1186,51 @@ case class BRound(child: Expression, scale: Expression) with Serializable with ImplicitCastInputTypes { def this(child: Expression) = this(child, Literal(0)) } + +/** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr is the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an An expression that resolves to + * a constant indicating the number of buckets + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns an long between 0 and `num_buckets`+1 by mapping the `expr` into buckets defined by the range [`min_value`, `max_value`].", + extended = """ +Examples: + > SELECT _FUNC_(5.35, 0.024, 10.06, 5); + 3 + """) +// scalastyle:on line.size.limit +case class WidthBucket( + expr: Expression, + minValue: Expression, + maxValue: Expression, + numBucket: Expression) extends QuaternaryExpression with ImplicitCastInputTypes { + + override def children: Seq[Expression] = Seq(expr, minValue, maxValue, numBucket) + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType, DoubleType, LongType) + override def dataType: DataType = LongType + override def nullable: Boolean = true + + override def nullSafeEval(ex: Any, min: Any, max: Any, num: Any): Any = { +MathUtils.widthBucket( + ex.asInstanceOf[Double], --- End diff -- foldable expression is correct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r127595777 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -1186,3 +1186,51 @@ case class BRound(child: Expression, scale: Expression) with Serializable with ImplicitCastInputTypes { def this(child: Expression) = this(child, Literal(0)) } + +/** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr is the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an An expression that resolves to + * a constant indicating the number of buckets + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns an long between 0 and `num_buckets`+1 by mapping the `expr` into buckets defined by the range [`min_value`, `max_value`].", + extended = """ +Examples: + > SELECT _FUNC_(5.35, 0.024, 10.06, 5); + 3 + """) +// scalastyle:on line.size.limit +case class WidthBucket( + expr: Expression, + minValue: Expression, + maxValue: Expression, + numBucket: Expression) extends QuaternaryExpression with ImplicitCastInputTypes { + + override def children: Seq[Expression] = Seq(expr, minValue, maxValue, numBucket) + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType, DoubleType, LongType) + override def dataType: DataType = LongType + override def nullable: Boolean = true + + override def nullSafeEval(ex: Any, min: Any, max: Any, num: Any): Any = { +MathUtils.widthBucket( + ex.asInstanceOf[Double], --- End diff -- What happened if the input is not a constant, but an foldable expression? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r127595711 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -1186,3 +1186,51 @@ case class BRound(child: Expression, scale: Expression) with Serializable with ImplicitCastInputTypes { def this(child: Expression) = this(child, Literal(0)) } + +/** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr is the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an An expression that resolves to --- End diff -- `an An` -> `an` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123941530 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/MathUtilsSuite.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.util.MathUtils._ + +class MathUtilsSuite extends SparkFunSuite { + + test("widthBucket") { +assert(widthBucket(5.35, 0.024, 10.06, 5) === 3) +assert(widthBucket(0, 1, 1, 1) === 0) +assert(widthBucket(20, 1, 1, 1) === 2) + +// Test https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2137.htm#OLADM717 +// WIDTH_BUCKET(credit_limit, 100, 5000, 10) +assert(widthBucket(500, 100, 5000, 10) === 1) +assert(widthBucket(2300, 100, 5000, 10) === 5) +assert(widthBucket(3500, 100, 5000, 10) === 7) +assert(widthBucket(1200, 100, 5000, 10) === 3) +assert(widthBucket(1400, 100, 5000, 10) === 3) +assert(widthBucket(700, 100, 5000, 10) === 2) +assert(widthBucket(5000, 100, 5000, 10) === 11) +assert(widthBucket(1800, 100, 5000, 10) === 4) +assert(widthBucket(400, 100, 5000, 10) === 1) + +// minValue == maxValue +assert(widthBucket(10, 4, 4, 15) === 16) --- End diff -- Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123937970 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/MathUtilsSuite.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.util.MathUtils._ + +class MathUtilsSuite extends SparkFunSuite { + + test("widthBucket") { +assert(widthBucket(5.35, 0.024, 10.06, 5) === 3) +assert(widthBucket(0, 1, 1, 1) === 0) +assert(widthBucket(20, 1, 1, 1) === 2) + +// Test https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2137.htm#OLADM717 +// WIDTH_BUCKET(credit_limit, 100, 5000, 10) +assert(widthBucket(500, 100, 5000, 10) === 1) +assert(widthBucket(2300, 100, 5000, 10) === 5) +assert(widthBucket(3500, 100, 5000, 10) === 7) +assert(widthBucket(1200, 100, 5000, 10) === 3) +assert(widthBucket(1400, 100, 5000, 10) === 3) +assert(widthBucket(700, 100, 5000, 10) === 2) +assert(widthBucket(5000, 100, 5000, 10) === 11) +assert(widthBucket(1800, 100, 5000, 10) === 4) +assert(widthBucket(400, 100, 5000, 10) === 1) + +// minValue == maxValue +assert(widthBucket(10, 4, 4, 15) === 16) --- End diff -- Yes, This is Oracle's behavior: ![oracle-width_bucket](https://user-images.githubusercontent.com/5399861/27527840-235cc452-5a80-11e7-99ff-90c0f10e2eb7.gif) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123936383 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/MathUtilsSuite.scala --- @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.util.MathUtils._ + +class MathUtilsSuite extends SparkFunSuite { + + test("widthBucket") { +assert(widthBucket(5.35, 0.024, 10.06, 5) === 3) +assert(widthBucket(0, 1, 1, 1) === 0) +assert(widthBucket(20, 1, 1, 1) === 2) + +// Test https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2137.htm#OLADM717 +// WIDTH_BUCKET(credit_limit, 100, 5000, 10) +assert(widthBucket(500, 100, 5000, 10) === 1) +assert(widthBucket(2300, 100, 5000, 10) === 5) +assert(widthBucket(3500, 100, 5000, 10) === 7) +assert(widthBucket(1200, 100, 5000, 10) === 3) +assert(widthBucket(1400, 100, 5000, 10) === 3) +assert(widthBucket(700, 100, 5000, 10) === 2) +assert(widthBucket(5000, 100, 5000, 10) === 11) +assert(widthBucket(1800, 100, 5000, 10) === 4) +assert(widthBucket(400, 100, 5000, 10) === 1) + +// minValue == maxValue +assert(widthBucket(10, 4, 4, 15) === 16) --- End diff -- Is this behavior consistent with other databases? If so, then I'm fine with this and please ignore my previous comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123934820 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr id the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an An expression that resolves to + * a constant indicating the number of buckets + * @return Returns an long between 0 and numBucket+1 by mapping the expr into buckets defined by + * the range [minValue, maxValue]. For example: + * widthBucket(0, 1, 1, 1) -> 0, widthBucket(20, 1, 1, 1) -> 2. + */ + def widthBucket(expr: Double, minValue: Double, maxValue: Double, numBucket: Long): Long = { + +if (numBucket <= 0) { + throw new AnalysisException(s"The num of bucket must be greater than 0, but got ${numBucket}") +} --- End diff -- I know, but my question is: should the possible result when minValue == maxValue only be 0, 1, or 2? e.g. should `width_bucket(2, 1, 1, 100) = 101 or just throw an error for invalid input? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123930006 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr id the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an An expression that resolves to + * a constant indicating the number of buckets + * @return Returns an long between 0 and numBucket+1 by mapping the expr into buckets defined by + * the range [minValue, maxValue]. For example: + * widthBucket(0, 1, 1, 1) -> 0, widthBucket(20, 1, 1, 1) -> 2. + */ + def widthBucket(expr: Double, minValue: Double, maxValue: Double, numBucket: Long): Long = { + +if (numBucket <= 0) { + throw new AnalysisException(s"The num of bucket must be greater than 0, but got ${numBucket}") +} --- End diff -- If `minValue == maxValue `, then `lower==upper`, result is `numBucket + 1L`: ``` val lower: Double = Math.min(minValue, maxValue) val upper: Double = Math.max(minValue, maxValue) val result: Long = if (expr < lower) { 0 } else if (expr >= upper) { numBucket + 1L } else { (numBucket.toDouble * (expr - lower) / (upper - lower) + 1).toLong } if (minValue > maxValue) (numBucket - result) + 1 else result ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123918251 --- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql --- @@ -92,3 +92,8 @@ select abs(-3.13), abs('-2.19'); -- positive/negative select positive('-1.11'), positive(-1.11), negative('-1.11'), negative(-1.11); + +-- width_bucket +select width_bucket(5.35, 0.024, 10.06, 5); +select width_bucket(5.35, 0.024, 10.06, -5); --- End diff -- add a case for wrong input type: `select width_bucket(5.35, 0.024, 10.06, 0.5);` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123918240 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr id the expression for which the histogram is being created --- End diff -- nit: id -> is --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123919502 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr id the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an An expression that resolves to + * a constant indicating the number of buckets + * @return Returns an long between 0 and numBucket+1 by mapping the expr into buckets defined by + * the range [minValue, maxValue]. For example: + * widthBucket(0, 1, 1, 1) -> 0, widthBucket(20, 1, 1, 1) -> 2. --- End diff -- Let's remove these examples in the description, they are just corner cases. My previous comment was just to make sure both ends should be included. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123919350 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr id the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an An expression that resolves to + * a constant indicating the number of buckets + * @return Returns an long between 0 and numBucket+1 by mapping the expr into buckets defined by + * the range [minValue, maxValue]. For example: + * widthBucket(0, 1, 1, 1) -> 0, widthBucket(20, 1, 1, 1) -> 2. + */ + def widthBucket(expr: Double, minValue: Double, maxValue: Double, numBucket: Long): Long = { + +if (numBucket <= 0) { + throw new AnalysisException(s"The num of bucket must be greater than 0, but got ${numBucket}") +} --- End diff -- Do we consider minValue == maxValue and numBucket > 1 valid input or not? Please also add a test case for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123677893 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr id the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an An expression that resolves to + * a constant indicating the number of buckets + * @return Returns an long between 0 and numBucket+1 by mapping the expr into buckets defined by + * the range [minValue, maxValue] --- End diff -- Yes, test on Oracle: ``` width_Bucket(0, 1, 1, 1) -> 0, width_Bucket(20, 1, 1, 1) -> 2 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123677056 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr id the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an An expression that resolves to + * a constant indicating the number of buckets + * @return Returns an long between 0 and numBucket+1 by mapping the expr into buckets defined by + * the range [minValue, maxValue] + */ + def widthBucket(expr: Double, minValue: Double, maxValue: Double, numBucket: Long): Long = { + +if (numBucket <= 0) { + throw new AnalysisException(s"The num of bucket must be greater than 0, but got ${numBucket}") +} + +val lower: Double = Math.min(minValue, maxValue) +val upper: Double = Math.max(minValue, maxValue) --- End diff -- Yes, oracle support it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123408655 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -1186,3 +1186,51 @@ case class BRound(child: Expression, scale: Expression) with Serializable with ImplicitCastInputTypes { def this(child: Expression) = this(child, Literal(0)) } + +/** + * The bucket number into which --- End diff -- nit: Returns the bucket number --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123409892 --- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql --- @@ -92,3 +92,8 @@ select abs(-3.13), abs('-2.19'); -- positive/negative select positive('-1.11'), positive(-1.11), negative('-1.11'), negative(-1.11); + +-- width_bucket --- End diff -- Instead, we need to add a test for sql queries using this function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123401825 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -631,3 +631,109 @@ abstract class TernaryExpression extends Expression { } } } + +/** + * An expression with four inputs and one output. The output is by default evaluated to null + * if any input is evaluated to null. + */ +abstract class QuaternaryExpression extends Expression { + + override def foldable: Boolean = children.forall(_.foldable) + + override def nullable: Boolean = children.exists(_.nullable) + + /** + * Default behavior of evaluation according to the default nullability of TernaryExpression. + * If subclass of TernaryExpression override nullable, probably should also override this. + */ + override def eval(input: InternalRow): Any = { +val exprs = children +val value1 = exprs(0).eval(input) +if (value1 != null) { + val value2 = exprs(1).eval(input) + if (value2 != null) { +val value3 = exprs(2).eval(input) +if (value3 != null) { + val value4 = exprs(3).eval(input) + if (value4 != null) { +return nullSafeEval(value1, value2, value3, value4) + } +} + } +} +null + } + + /** + * Called by default [[eval]] implementation. If subclass of TernaryExpression keep the default + * nullability, they can override this method to save null-check code. If we need full control + * of evaluation process, we should override [[eval]]. + */ + protected def nullSafeEval(input1: Any, input2: Any, input3: Any, input4: Any): Any = +sys.error(s"TernaryExpressions must override either eval or nullSafeEval") + + /** + * Short hand for generating ternary evaluation code. + * If either of the sub-expressions is null, the result of this computation + * is assumed to be null. + * + * @param f accepts three variable names and returns Java code to compute the output. + */ + protected def defineCodeGen( +ctx: CodegenContext, +ev: ExprCode, +f: (String, String, String, String) => String): ExprCode = { +nullSafeCodeGen(ctx, ev, (eval1, eval2, eval3, eval4) => { + s"${ev.value} = ${f(eval1, eval2, eval3, eval3)};" --- End diff -- the last `eval3` -> `eval4` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123410671 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr id the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an An expression that resolves to + * a constant indicating the number of buckets + * @return Returns an long between 0 and numBucket+1 by mapping the expr into buckets defined by + * the range [minValue, maxValue] + */ + def widthBucket(expr: Double, minValue: Double, maxValue: Double, numBucket: Long): Long = { + +if (numBucket <= 0) { + throw new AnalysisException(s"The num of bucket must be greater than 0, but got ${numBucket}") +} + +val lower: Double = Math.min(minValue, maxValue) +val upper: Double = Math.max(minValue, maxValue) + +val preResult: Long = if (expr < lower) { + 0 +} else if (expr >= upper) { + Math.addExact(numBucket, 1) +} else { + (numBucket.toDouble * (expr - lower) / (upper - lower) + 1).toLong +} + +val result = if (minValue > maxValue) (numBucket - preResult) + 1 else preResult +result --- End diff -- nit: `if (minValue > maxValue) (numBucket - preResult) + 1 else preResult` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123411306 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr id the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an An expression that resolves to + * a constant indicating the number of buckets + * @return Returns an long between 0 and numBucket+1 by mapping the expr into buckets defined by + * the range [minValue, maxValue] + */ + def widthBucket(expr: Double, minValue: Double, maxValue: Double, numBucket: Long): Long = { + +if (numBucket <= 0) { + throw new AnalysisException(s"The num of bucket must be greater than 0, but got ${numBucket}") +} + +val lower: Double = Math.min(minValue, maxValue) +val upper: Double = Math.max(minValue, maxValue) + +val preResult: Long = if (expr < lower) { + 0 +} else if (expr >= upper) { + Math.addExact(numBucket, 1) +} else { + (numBucket.toDouble * (expr - lower) / (upper - lower) + 1).toLong --- End diff -- what if upper == lower? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123411090 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr id the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an An expression that resolves to + * a constant indicating the number of buckets + * @return Returns an long between 0 and numBucket+1 by mapping the expr into buckets defined by + * the range [minValue, maxValue] + */ + def widthBucket(expr: Double, minValue: Double, maxValue: Double, numBucket: Long): Long = { + +if (numBucket <= 0) { + throw new AnalysisException(s"The num of bucket must be greater than 0, but got ${numBucket}") +} + +val lower: Double = Math.min(minValue, maxValue) +val upper: Double = Math.max(minValue, maxValue) + +val preResult: Long = if (expr < lower) { + 0 +} else if (expr >= upper) { + Math.addExact(numBucket, 1) --- End diff -- Do we really need to use `addExact`? In Oracle's doc, `numBucket` is an integer, then we can use `numBucket + 1L` here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123410649 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr id the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an An expression that resolves to + * a constant indicating the number of buckets + * @return Returns an long between 0 and numBucket+1 by mapping the expr into buckets defined by + * the range [minValue, maxValue] --- End diff -- Both endpoints are included? Can you check with other databases and add a comment here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123402665 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -631,3 +631,109 @@ abstract class TernaryExpression extends Expression { } } } + +/** + * An expression with four inputs and one output. The output is by default evaluated to null + * if any input is evaluated to null. + */ +abstract class QuaternaryExpression extends Expression { + + override def foldable: Boolean = children.forall(_.foldable) + + override def nullable: Boolean = children.exists(_.nullable) + + /** + * Default behavior of evaluation according to the default nullability of TernaryExpression. + * If subclass of TernaryExpression override nullable, probably should also override this. + */ + override def eval(input: InternalRow): Any = { +val exprs = children +val value1 = exprs(0).eval(input) +if (value1 != null) { + val value2 = exprs(1).eval(input) + if (value2 != null) { +val value3 = exprs(2).eval(input) +if (value3 != null) { + val value4 = exprs(3).eval(input) + if (value4 != null) { +return nullSafeEval(value1, value2, value3, value4) + } +} + } +} +null + } + + /** + * Called by default [[eval]] implementation. If subclass of TernaryExpression keep the default + * nullability, they can override this method to save null-check code. If we need full control + * of evaluation process, we should override [[eval]]. + */ + protected def nullSafeEval(input1: Any, input2: Any, input3: Any, input4: Any): Any = +sys.error(s"TernaryExpressions must override either eval or nullSafeEval") + + /** + * Short hand for generating ternary evaluation code. + * If either of the sub-expressions is null, the result of this computation + * is assumed to be null. + * + * @param f accepts three variable names and returns Java code to compute the output. + */ + protected def defineCodeGen( +ctx: CodegenContext, +ev: ExprCode, +f: (String, String, String, String) => String): ExprCode = { +nullSafeCodeGen(ctx, ev, (eval1, eval2, eval3, eval4) => { + s"${ev.value} = ${f(eval1, eval2, eval3, eval3)};" +}) + } + + /** + * Short hand for generating ternary evaluation code. + * If either of the sub-expressions is null, the result of this computation + * is assumed to be null. + * + * @param f function that accepts the 3 non-null evaluation result names of children --- End diff -- 3 -> 4 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123408856 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr id the expression for which the histogram is being created --- End diff -- id -> is --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123409716 --- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql --- @@ -92,3 +92,8 @@ select abs(-3.13), abs('-2.19'); -- positive/negative select positive('-1.11'), positive(-1.11), negative('-1.11'), negative(-1.11); + +-- width_bucket --- End diff -- Do we need end-to-end tests here? I think we already cover these cases in other test suites. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123410449 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala --- @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.sql.AnalysisException + +object MathUtils { + + /** + * Returns the bucket number into which + * the value of this expression would fall after being evaluated. + * + * @param expr id the expression for which the histogram is being created + * @param minValue is an expression that resolves + * to the minimum end point of the acceptable range for expr + * @param maxValue is an expression that resolves + * to the maximum end point of the acceptable range for expr + * @param numBucket is an An expression that resolves to + * a constant indicating the number of buckets + * @return Returns an long between 0 and numBucket+1 by mapping the expr into buckets defined by + * the range [minValue, maxValue] + */ + def widthBucket(expr: Double, minValue: Double, maxValue: Double, numBucket: Long): Long = { + +if (numBucket <= 0) { + throw new AnalysisException(s"The num of bucket must be greater than 0, but got ${numBucket}") +} + +val lower: Double = Math.min(minValue, maxValue) +val upper: Double = Math.max(minValue, maxValue) --- End diff -- Does other databases allow max value to appear first? i.e. `widthBucket(3.14, 4, 0, 3, 1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123412204 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/MathUtilsSuite.scala --- @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.util.MathUtils._ + +class MathUtilsSuite extends SparkFunSuite { + + test("widthBucket") { +assert(widthBucket(5.35, 0.024, 10.06, 5) === 3) +assert(widthBucket(99, 100, 5000, 10) === 0) +assert(widthBucket(100, 100, 5000, 10) === 1) +assert(widthBucket(590, 100, 5000, 10) === 2) +assert(widthBucket(5000, 100, 5000, 10) === 11) +assert(widthBucket(6000, 100, 5000, 10) === 11) --- End diff -- Can you use the same cases from Oracle's doc? just to make sure we are getting the same results here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123409299 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala --- @@ -644,4 +645,36 @@ class MathExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation(BRound(-0.35, 1), -0.4) checkEvaluation(BRound(-35, -1), -40) } + + test("width_bucket") { +def test( + expr: Double, + minValue: Double, + maxValue: Double, + numBucket: Long, + expected: Long): Unit = { + checkEvaluation(WidthBucket(Literal.create(expr, DoubleType), +Literal.create(minValue, DoubleType), +Literal.create(maxValue, DoubleType), +Literal.create(numBucket, LongType)), +expected) +} + +test(5.35, 0.024, 10.06, 5, 3) + +test(3.14, 0, 4, 3, 3) +test(2, 0, 4, 3, 2) +test(-1, 0, 3.2, 4, 0) + +test(3.14, 4, 0, 3, 1) +test(2, 4, 0, 3, 2) +test(-1, 3.2, 0, 4, 5) + +intercept[AnalysisException]{ --- End diff -- also add test cases for null input and wrong input type --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18323#discussion_r123401878 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -631,3 +631,109 @@ abstract class TernaryExpression extends Expression { } } } + +/** + * An expression with four inputs and one output. The output is by default evaluated to null + * if any input is evaluated to null. + */ +abstract class QuaternaryExpression extends Expression { + + override def foldable: Boolean = children.forall(_.foldable) + + override def nullable: Boolean = children.exists(_.nullable) + + /** + * Default behavior of evaluation according to the default nullability of TernaryExpression. --- End diff -- `TernaryExpression` -> `QuaternaryExpression`, please also fix other places --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/18323 [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET ## What changes were proposed in this pull request? Add build-in SQL function - `WIDTH_BUCKET` Ref: https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2137.htm#OLADM717 ## How was this patch tested? unit tests and manual tests. manual tests(compare `spark-sql` results with `oracle`): ```sql select width_bucket(5.35, 0.024, 10.06, 5) from dual; -- 3 -- select width_bucket(3.14, 0, 4, 3) from dual; -- 3 -- select width_bucket(2, 0, 4, 3) from dual;-- 2 -- select width_bucket(-1, 0, 3.2, 4) from dual; -- 0 -- select width_bucket(3.14, 4, 0, 3) from dual; -- 1 -- select width_bucket(2, 4, 0, 3) from dual;-- 2 -- select width_bucket(-1, 3.2, 0, 4) from dual; -- 5 -- ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-21117 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18323.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18323 commit 20fe5679735380b931fb3eab6841c1da376c1742 Author: Yuming WangDate: 2017-06-16T06:28:18Z Built-in SQL Function Support - WIDTH_BUCKET --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org