[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2018-11-10 Thread wangyum
Github user wangyum closed the pull request at:

https://github.com/apache/spark/pull/18323


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-08-01 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r130526899
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1186,3 +1186,124 @@ case class BRound(child: Expression, scale: 
Expression)
 with Serializable with ImplicitCastInputTypes {
   def this(child: Expression) = this(child, Literal(0))
 }
+
+/**
+ *  Returns the bucket number into which
+ *  the value of this expression would fall after being evaluated.
+ *
+ * @param expr is the expression for which the histogram is being created
--- End diff --

Comments copy from 
https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2137.htm#OLADM717
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-08-01 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r130526538
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1186,3 +1186,124 @@ case class BRound(child: Expression, scale: 
Expression)
 with Serializable with ImplicitCastInputTypes {
   def this(child: Expression) = this(child, Literal(0))
 }
+
+/**
+ *  Returns the bucket number into which
+ *  the value of this expression would fall after being evaluated.
+ *
+ * @param expr is the expression for which the histogram is being created
+ * @param minValue is an expression that resolves
+ * to the minimum end point of the acceptable range for 
expr
+ * @param maxValue is an expression that resolves
+ * to the maximum end point of the acceptable range for 
expr
+ * @param numBucket is an expression that resolves to
+ *  a constant indicating the number of buckets
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns the 
`bucket` to which operand would be assigned in an equidepth histogram with 
`num_bucket` buckets, in the range `min_value` to `max_value`.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(5.35, 0.024, 10.06, 5);
+   3
+  """)
+// scalastyle:on line.size.limit
+case class WidthBucket(
+  expr: Expression,
+  minValue: Expression,
+  maxValue: Expression,
+  numBucket: Expression) extends QuaternaryExpression with 
ImplicitCastInputTypes {
+
+  override def children: Seq[Expression] = Seq(expr, minValue, maxValue, 
numBucket)
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType, DoubleType, LongType)
+  override def dataType: DataType = LongType
+  override def nullable: Boolean = true
+
+  private val isFoldable = minValue.foldable && maxValue.foldable && 
numBucket.foldable
+
+  private lazy val _minValue: Any = minValue.eval(EmptyRow)
+  private lazy val minValueV = _minValue.asInstanceOf[Double]
+
+  private lazy val _maxValue: Any = maxValue.eval(EmptyRow)
+  private lazy val maxValueV = _maxValue.asInstanceOf[Double]
+
+  private lazy val _numBucket: Any = numBucket.eval(EmptyRow)
+  private lazy val numBucketV = _numBucket.asInstanceOf[Long]
+
+  private val errMsg = "The argument [%d] of WIDTH_BUCKET function is NULL 
or invalid."
+
+  override def eval(input: InternalRow): Any = {
+
+if (isFoldable) {
+  if (_minValue == null) {
+throw new RuntimeException(errMsg.format(2))
+  } else if (_maxValue == null) {
+throw new RuntimeException(errMsg.format(3))
+  } else if (_numBucket == null || numBucketV <= 0) {
+throw new RuntimeException(errMsg.format(4))
--- End diff --

To be consistent with oracle:

![oracle-width_bucket](https://user-images.githubusercontent.com/5399861/28811683-65c7c5a8-76c3-11e7-990b-4e2ca41eca6e.jpeg)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-28 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r130038008
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1186,3 +1186,124 @@ case class BRound(child: Expression, scale: 
Expression)
 with Serializable with ImplicitCastInputTypes {
   def this(child: Expression) = this(child, Literal(0))
 }
+
+/**
+ *  Returns the bucket number into which
+ *  the value of this expression would fall after being evaluated.
+ *
+ * @param expr is the expression for which the histogram is being created
+ * @param minValue is an expression that resolves
+ * to the minimum end point of the acceptable range for 
expr
+ * @param maxValue is an expression that resolves
+ * to the maximum end point of the acceptable range for 
expr
+ * @param numBucket is an expression that resolves to
+ *  a constant indicating the number of buckets
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns the 
`bucket` to which operand would be assigned in an equidepth histogram with 
`num_bucket` buckets, in the range `min_value` to `max_value`.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(5.35, 0.024, 10.06, 5);
+   3
+  """)
+// scalastyle:on line.size.limit
+case class WidthBucket(
+  expr: Expression,
+  minValue: Expression,
+  maxValue: Expression,
+  numBucket: Expression) extends QuaternaryExpression with 
ImplicitCastInputTypes {
+
+  override def children: Seq[Expression] = Seq(expr, minValue, maxValue, 
numBucket)
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType, DoubleType, LongType)
+  override def dataType: DataType = LongType
+  override def nullable: Boolean = true
+
+  private val isFoldable = minValue.foldable && maxValue.foldable && 
numBucket.foldable
+
+  private lazy val _minValue: Any = minValue.eval(EmptyRow)
--- End diff --

minValue.eval()?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-28 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r130037738
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1186,3 +1186,124 @@ case class BRound(child: Expression, scale: 
Expression)
 with Serializable with ImplicitCastInputTypes {
   def this(child: Expression) = this(child, Literal(0))
 }
+
+/**
+ *  Returns the bucket number into which
+ *  the value of this expression would fall after being evaluated.
+ *
+ * @param expr is the expression for which the histogram is being created
+ * @param minValue is an expression that resolves
+ * to the minimum end point of the acceptable range for 
expr
+ * @param maxValue is an expression that resolves
+ * to the maximum end point of the acceptable range for 
expr
+ * @param numBucket is an expression that resolves to
+ *  a constant indicating the number of buckets
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns the 
`bucket` to which operand would be assigned in an equidepth histogram with 
`num_bucket` buckets, in the range `min_value` to `max_value`.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(5.35, 0.024, 10.06, 5);
+   3
+  """)
+// scalastyle:on line.size.limit
+case class WidthBucket(
+  expr: Expression,
+  minValue: Expression,
+  maxValue: Expression,
+  numBucket: Expression) extends QuaternaryExpression with 
ImplicitCastInputTypes {
+
+  override def children: Seq[Expression] = Seq(expr, minValue, maxValue, 
numBucket)
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType, DoubleType, LongType)
+  override def dataType: DataType = LongType
+  override def nullable: Boolean = true
+
+  private val isFoldable = minValue.foldable && maxValue.foldable && 
numBucket.foldable
+
+  private lazy val _minValue: Any = minValue.eval(EmptyRow)
+  private lazy val minValueV = _minValue.asInstanceOf[Double]
+
+  private lazy val _maxValue: Any = maxValue.eval(EmptyRow)
+  private lazy val maxValueV = _maxValue.asInstanceOf[Double]
+
+  private lazy val _numBucket: Any = numBucket.eval(EmptyRow)
+  private lazy val numBucketV = _numBucket.asInstanceOf[Long]
+
+  private val errMsg = "The argument [%d] of WIDTH_BUCKET function is NULL 
or invalid."
+
+  override def eval(input: InternalRow): Any = {
+
+if (isFoldable) {
+  if (_minValue == null) {
+throw new RuntimeException(errMsg.format(2))
+  } else if (_maxValue == null) {
+throw new RuntimeException(errMsg.format(3))
+  } else if (_numBucket == null || numBucketV <= 0) {
+throw new RuntimeException(errMsg.format(4))
--- End diff --

Is the above handling for null valid? Except for null-tolerant expressions, 
usually any input of an expression is null, it returns null.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-28 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r130036804
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -631,3 +632,109 @@ abstract class TernaryExpression extends Expression {
 }
   }
 }
+
+/**
+ * An expression with four inputs and one output. The output is by default 
evaluated to null
+ * if any input is evaluated to null.
+ */
+abstract class QuaternaryExpression extends Expression {
--- End diff --

We can remove this based on your current code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-28 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r130036608
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1186,3 +1186,124 @@ case class BRound(child: Expression, scale: 
Expression)
 with Serializable with ImplicitCastInputTypes {
   def this(child: Expression) = this(child, Literal(0))
 }
+
+/**
+ *  Returns the bucket number into which
+ *  the value of this expression would fall after being evaluated.
+ *
+ * @param expr is the expression for which the histogram is being created
+ * @param minValue is an expression that resolves
+ * to the minimum end point of the acceptable range for 
expr
+ * @param maxValue is an expression that resolves
+ * to the maximum end point of the acceptable range for 
expr
+ * @param numBucket is an expression that resolves to
+ *  a constant indicating the number of buckets
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns the 
`bucket` to which operand would be assigned in an equidepth histogram with 
`num_bucket` buckets, in the range `min_value` to `max_value`.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(5.35, 0.024, 10.06, 5);
+   3
+  """)
+// scalastyle:on line.size.limit
+case class WidthBucket(
+  expr: Expression,
+  minValue: Expression,
+  maxValue: Expression,
+  numBucket: Expression) extends QuaternaryExpression with 
ImplicitCastInputTypes {
+
+  override def children: Seq[Expression] = Seq(expr, minValue, maxValue, 
numBucket)
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType, DoubleType, LongType)
+  override def dataType: DataType = LongType
+  override def nullable: Boolean = true
+
+  private val isFoldable = minValue.foldable && maxValue.foldable && 
numBucket.foldable
+
+  private lazy val _minValue: Any = minValue.eval(EmptyRow)
+  private lazy val minValueV = _minValue.asInstanceOf[Double]
+
+  private lazy val _maxValue: Any = maxValue.eval(EmptyRow)
+  private lazy val maxValueV = _maxValue.asInstanceOf[Double]
+
+  private lazy val _numBucket: Any = numBucket.eval(EmptyRow)
+  private lazy val numBucketV = _numBucket.asInstanceOf[Long]
+
+  private val errMsg = "The argument [%d] of WIDTH_BUCKET function is NULL 
or invalid."
+
+  override def eval(input: InternalRow): Any = {
+
+if (isFoldable) {
+  if (_minValue == null) {
+throw new RuntimeException(errMsg.format(2))
+  } else if (_maxValue == null) {
+throw new RuntimeException(errMsg.format(3))
+  } else if (_numBucket == null || numBucketV <= 0) {
+throw new RuntimeException(errMsg.format(4))
+  } else {
+val exprV = expr.eval(input)
+if (exprV == null) {
+  null
+} else {
+  MathUtils.widthBucket(exprV.asInstanceOf[Double], minValueV, 
maxValueV, numBucketV)
+}
+  }
+} else {
+  val evals = children.map(_.eval(input))
+  val invalid = evals.zipWithIndex.filter { case (e, i) =>
+(i > 0 && e == null) || (i == 3 && e.asInstanceOf[Long] <= 0)
+  }
+  if (invalid.nonEmpty) {
+invalid.foreach(l => throw new RuntimeException(errMsg.format(l._2 
+ 1)))
+  } else if (evals(0) == null) {
+null
+  } else {
+MathUtils.widthBucket(
+  evals(0).asInstanceOf[Double],
+  evals(1).asInstanceOf[Double],
+  evals(2).asInstanceOf[Double],
+  evals(3).asInstanceOf[Long])
+  }
+}
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
--- End diff --

ditto. As you override `doGenCode`, I don't see the reason why you create 
`QuaternaryExpression` for this expr.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-28 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r130036586
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1186,3 +1186,124 @@ case class BRound(child: Expression, scale: 
Expression)
 with Serializable with ImplicitCastInputTypes {
   def this(child: Expression) = this(child, Literal(0))
 }
+
+/**
+ *  Returns the bucket number into which
+ *  the value of this expression would fall after being evaluated.
+ *
+ * @param expr is the expression for which the histogram is being created
+ * @param minValue is an expression that resolves
+ * to the minimum end point of the acceptable range for 
expr
+ * @param maxValue is an expression that resolves
+ * to the maximum end point of the acceptable range for 
expr
+ * @param numBucket is an expression that resolves to
+ *  a constant indicating the number of buckets
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns the 
`bucket` to which operand would be assigned in an equidepth histogram with 
`num_bucket` buckets, in the range `min_value` to `max_value`.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(5.35, 0.024, 10.06, 5);
+   3
+  """)
+// scalastyle:on line.size.limit
+case class WidthBucket(
+  expr: Expression,
+  minValue: Expression,
+  maxValue: Expression,
+  numBucket: Expression) extends QuaternaryExpression with 
ImplicitCastInputTypes {
+
+  override def children: Seq[Expression] = Seq(expr, minValue, maxValue, 
numBucket)
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType, DoubleType, LongType)
+  override def dataType: DataType = LongType
+  override def nullable: Boolean = true
+
+  private val isFoldable = minValue.foldable && maxValue.foldable && 
numBucket.foldable
+
+  private lazy val _minValue: Any = minValue.eval(EmptyRow)
+  private lazy val minValueV = _minValue.asInstanceOf[Double]
+
+  private lazy val _maxValue: Any = maxValue.eval(EmptyRow)
+  private lazy val maxValueV = _maxValue.asInstanceOf[Double]
+
+  private lazy val _numBucket: Any = numBucket.eval(EmptyRow)
+  private lazy val numBucketV = _numBucket.asInstanceOf[Long]
+
+  private val errMsg = "The argument [%d] of WIDTH_BUCKET function is NULL 
or invalid."
+
+  override def eval(input: InternalRow): Any = {
--- End diff --

Why you create `QuaternaryExpression` with default `nullSafeEval` but 
didn't use it? If you want do so, you don't need `QuaternaryExpression`, simply 
let `WidthBucket` extend `Expression`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-28 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r130035216
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1186,3 +1186,124 @@ case class BRound(child: Expression, scale: 
Expression)
 with Serializable with ImplicitCastInputTypes {
   def this(child: Expression) = this(child, Literal(0))
 }
+
+/**
+ *  Returns the bucket number into which
+ *  the value of this expression would fall after being evaluated.
+ *
+ * @param expr is the expression for which the histogram is being created
--- End diff --

is this comment correct? How about "the expression for which the bucket 
number in the histogram would return"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-28 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r130021930
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1219,44 +1219,91 @@ case class WidthBucket(
   override def dataType: DataType = LongType
   override def nullable: Boolean = true
 
+  private val isFoldable = minValue.foldable && maxValue.foldable && 
numBucket.foldable
+
+  private lazy val _minValue: Any = minValue.eval(EmptyRow)
+  private lazy val minValueV = _minValue.asInstanceOf[Double]
--- End diff --

if `minValue.eval(EmptyRow) == null`, 
`minValue.eval(EmptyRow).asInstanceOf[Double]` will be `0.0`,
So keep both of them here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r128569724
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1186,3 +1186,77 @@ case class BRound(child: Expression, scale: 
Expression)
 with Serializable with ImplicitCastInputTypes {
   def this(child: Expression) = this(child, Literal(0))
 }
+
+/**
+ *  Returns the bucket number into which
+ *  the value of this expression would fall after being evaluated.
+ *
+ * @param expr is the expression for which the histogram is being created
+ * @param minValue is an expression that resolves
+ * to the minimum end point of the acceptable range for 
expr
+ * @param maxValue is an expression that resolves
+ * to the maximum end point of the acceptable range for 
expr
+ * @param numBucket is an expression that resolves to
+ *  a constant indicating the number of buckets
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns the 
`bucket` to which operand would be assigned in an equidepth histogram with 
`num_bucket` buckets, in the range `min_value` to `max_value`.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(5.35, 0.024, 10.06, 5);
+   3
+  """)
+// scalastyle:on line.size.limit
+case class WidthBucket(
+  expr: Expression,
+  minValue: Expression,
+  maxValue: Expression,
+  numBucket: Expression) extends QuaternaryExpression with 
ImplicitCastInputTypes {
+
+  override def children: Seq[Expression] = Seq(expr, minValue, maxValue, 
numBucket)
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType, DoubleType, LongType)
+  override def dataType: DataType = LongType
+  override def nullable: Boolean = true
+
+  private val errMsg = "The argument [%d] of WIDTH_BUCKET function is NULL 
or invalid."
+
+  override def eval(input: InternalRow): Any = {
+
+children.map(_.eval(input)).zipWithIndex.foreach { case (e, i) =>
+  if ((i > 0 && null == e) || (i == 3 && e.asInstanceOf[Long] < 0)) {
+throw new RuntimeException(errMsg.format(i + 1))
--- End diff --

Based on my understanding, the last three parameters are foldable. Thus, we 
are able to evaluate them during the plan analysis, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-20 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r128482157
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr is the expression for which the histogram is being created
+   * @param minValue is an expression that resolves
+   * to the minimum end point of the acceptable range for 
expr
+   * @param maxValue is an expression that resolves
+   * to the maximum end point of the acceptable range for 
expr
+   * @param numBucket is an expression that resolves to
+   *  a constant indicating the number of buckets
+   * @return Returns an long between 0 and numBucket+1 by mapping the expr 
into buckets defined by
+   * the range [minValue, maxValue].
+   */
+  def widthBucket(expr: Double, minValue: Double, maxValue: Double, 
numBucket: Long): Long = {
+
+if (numBucket <= 0) {
+  throw new AnalysisException(s"The num of bucket must be greater than 
0, but got ${numBucket}")
+}
+
+val lower: Double = Math.min(minValue, maxValue)
+val upper: Double = Math.max(minValue, maxValue)
+
+val result: Long = if (expr < lower) {
--- End diff --

Added 2 test case:

https://github.com/apache/spark/commit/099db6d32583dce473757a9ffc715ff4c4b44265#diff-96c9a63f4a212cd144bc18cf7063b649R670


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-20 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r128482031
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr is the expression for which the histogram is being created
+   * @param minValue is an expression that resolves
+   * to the minimum end point of the acceptable range for 
expr
+   * @param maxValue is an expression that resolves
+   * to the maximum end point of the acceptable range for 
expr
+   * @param numBucket is an expression that resolves to
+   *  a constant indicating the number of buckets
+   * @return Returns an long between 0 and numBucket+1 by mapping the expr 
into buckets defined by
+   * the range [minValue, maxValue].
+   */
+  def widthBucket(expr: Double, minValue: Double, maxValue: Double, 
numBucket: Long): Long = {
+
+if (numBucket <= 0) {
+  throw new AnalysisException(s"The num of bucket must be greater than 
0, but got ${numBucket}")
+}
+
+val lower: Double = Math.min(minValue, maxValue)
+val upper: Double = Math.max(minValue, maxValue)
+
+val result: Long = if (expr < lower) {
+  0
+} else if (expr >= upper) {
+  numBucket + 1L
--- End diff --

Added 2 test case:

https://github.com/apache/spark/pull/18323/commits/099db6d32583dce473757a9ffc715ff4c4b44265#diff-96c9a63f4a212cd144bc18cf7063b649R674


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-16 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r127614336
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr is the expression for which the histogram is being created
+   * @param minValue is an expression that resolves
+   * to the minimum end point of the acceptable range for 
expr
+   * @param maxValue is an expression that resolves
+   * to the maximum end point of the acceptable range for 
expr
+   * @param numBucket is an expression that resolves to
+   *  a constant indicating the number of buckets
+   * @return Returns an long between 0 and numBucket+1 by mapping the expr 
into buckets defined by
+   * the range [minValue, maxValue].
+   */
+  def widthBucket(expr: Double, minValue: Double, maxValue: Double, 
numBucket: Long): Long = {
+
+if (numBucket <= 0) {
+  throw new AnalysisException(s"The num of bucket must be greater than 
0, but got ${numBucket}")
+}
+
+val lower: Double = Math.min(minValue, maxValue)
+val upper: Double = Math.max(minValue, maxValue)
+
+val result: Long = if (expr < lower) {
+  0
+} else if (expr >= upper) {
+  numBucket + 1L
--- End diff --

// an overflow bucket numbered num_buckets+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-16 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r127614328
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr is the expression for which the histogram is being created
+   * @param minValue is an expression that resolves
+   * to the minimum end point of the acceptable range for 
expr
+   * @param maxValue is an expression that resolves
+   * to the maximum end point of the acceptable range for 
expr
+   * @param numBucket is an expression that resolves to
+   *  a constant indicating the number of buckets
+   * @return Returns an long between 0 and numBucket+1 by mapping the expr 
into buckets defined by
+   * the range [minValue, maxValue].
+   */
+  def widthBucket(expr: Double, minValue: Double, maxValue: Double, 
numBucket: Long): Long = {
+
+if (numBucket <= 0) {
+  throw new AnalysisException(s"The num of bucket must be greater than 
0, but got ${numBucket}")
+}
+
+val lower: Double = Math.min(minValue, maxValue)
+val upper: Double = Math.max(minValue, maxValue)
+
+val result: Long = if (expr < lower) {
--- End diff --

// Creates (when needed) an underflow bucket numbered 0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-16 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r127614121
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1186,3 +1186,51 @@ case class BRound(child: Expression, scale: 
Expression)
 with Serializable with ImplicitCastInputTypes {
   def this(child: Expression) = this(child, Literal(0))
 }
+
+/**
+ *  Returns the bucket number into which
+ *  the value of this expression would fall after being evaluated.
+ *
+ * @param expr is the expression for which the histogram is being created
+ * @param minValue is an expression that resolves
+ * to the minimum end point of the acceptable range for 
expr
+ * @param maxValue is an expression that resolves
+ * to the maximum end point of the acceptable range for 
expr
+ * @param numBucket is an expression that resolves to
+ *  a constant indicating the number of buckets
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns an 
long between 0 and `num_buckets`+1 by mapping the `expr` into buckets defined 
by the range [`min_value`, `max_value`].",
+  extended = """
+Examples:
+  > SELECT _FUNC_(5.35, 0.024, 10.06, 5);
+   3
+  """)
+// scalastyle:on line.size.limit
+case class WidthBucket(
+  expr: Expression,
+  minValue: Expression,
+  maxValue: Expression,
+  numBucket: Expression) extends QuaternaryExpression with 
ImplicitCastInputTypes {
+
+  override def children: Seq[Expression] = Seq(expr, minValue, maxValue, 
numBucket)
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType, DoubleType, LongType)
+  override def dataType: DataType = LongType
+  override def nullable: Boolean = true
+
+  override def nullSafeEval(ex: Any, min: Any, max: Any, num: Any): Any = {
--- End diff --

We should not use nullSafeEval. See the answers I got from Oracle.
```
select width_bucket(col1, 0, 10, -9) from t;
ORA-30494: The argument [4] of WIDTH_BUCKET function is NULL or invalid.

select width_bucket(col1, 0, 10, null) from t;
ORA-30494: The argument [4] of WIDTH_BUCKET function is NULL or invalid.

select width_bucket(col1, null, 5, 9) from t;
ORA-30494: The argument [2] of WIDTH_BUCKET function is NULL or invalid.

select width_bucket(col1, 5, null, 9) from t;
ORA-30494: The argument [3] of WIDTH_BUCKET function is NULL or invalid.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-16 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r127613839
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala
 ---
@@ -644,4 +645,37 @@ class MathExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkEvaluation(BRound(-0.35, 1), -0.4)
 checkEvaluation(BRound(-35, -1), -40)
   }
+
+  test("width_bucket") {
+def test(
+  expr: Double,
+  minValue: Double,
+  maxValue: Double,
+  numBucket: Long,
+  expected: Long): Unit = {
+  checkEvaluation(WidthBucket(Literal.create(expr, DoubleType),
+Literal.create(minValue, DoubleType),
+Literal.create(maxValue, DoubleType),
+Literal.create(numBucket, LongType)),
+expected)
+}
+
+test(5.35, 0.024, 10.06, 5, 3)
+
+test(3.14, 0, 4, 3, 3)
+test(2, 0, 4, 3, 2)
+test(-1, 0, 3.2, 4, 0)
+
+test(3.14, 4, 0, 3, 1)
+test(2, 4, 0, 3, 2)
+test(-1, 3.2, 0, 4, 5)
+
+// numBucket <= 0
+intercept[AnalysisException]{
--- End diff --

Please capture the error message and verify it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-16 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r127613829
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr is the expression for which the histogram is being created
+   * @param minValue is an expression that resolves
+   * to the minimum end point of the acceptable range for 
expr
+   * @param maxValue is an expression that resolves
+   * to the maximum end point of the acceptable range for 
expr
+   * @param numBucket is an expression that resolves to
+   *  a constant indicating the number of buckets
+   * @return Returns an long between 0 and numBucket+1 by mapping the expr 
into buckets defined by
+   * the range [minValue, maxValue].
+   */
+  def widthBucket(expr: Double, minValue: Double, maxValue: Double, 
numBucket: Long): Long = {
+
+if (numBucket <= 0) {
+  throw new AnalysisException(s"The num of bucket must be greater than 
0, but got ${numBucket}")
--- End diff --

This check needs to be moved to `case class WidthBucket`. We do not want to 
issue such an exception during the execution of the query. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-16 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r127613753
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1186,3 +1186,51 @@ case class BRound(child: Expression, scale: 
Expression)
 with Serializable with ImplicitCastInputTypes {
   def this(child: Expression) = this(child, Literal(0))
 }
+
+/**
+ *  Returns the bucket number into which
+ *  the value of this expression would fall after being evaluated.
+ *
+ * @param expr is the expression for which the histogram is being created
+ * @param minValue is an expression that resolves
+ * to the minimum end point of the acceptable range for 
expr
+ * @param maxValue is an expression that resolves
+ * to the maximum end point of the acceptable range for 
expr
+ * @param numBucket is an expression that resolves to
+ *  a constant indicating the number of buckets
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns an 
long between 0 and `num_buckets`+1 by mapping the `expr` into buckets defined 
by the range [`min_value`, `max_value`].",
--- End diff --

> Return the `bucket` to which operand would be assigned in an equidepth 
histogram with `num_bucket` buckets, in the range `min_value` to `max_value`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-16 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r127613668
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1186,3 +1186,51 @@ case class BRound(child: Expression, scale: 
Expression)
 with Serializable with ImplicitCastInputTypes {
   def this(child: Expression) = this(child, Literal(0))
 }
+
+/**
+ *  Returns the bucket number into which
+ *  the value of this expression would fall after being evaluated.
+ *
+ * @param expr is the expression for which the histogram is being created
+ * @param minValue is an expression that resolves
+ * to the minimum end point of the acceptable range for 
expr
+ * @param maxValue is an expression that resolves
+ * to the maximum end point of the acceptable range for 
expr
+ * @param numBucket is an expression that resolves to
+ *  a constant indicating the number of buckets
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns an 
long between 0 and `num_buckets`+1 by mapping the `expr` into buckets defined 
by the range [`min_value`, `max_value`].",
--- End diff --

Nit: `an` -> `a`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-16 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r127601368
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1186,3 +1186,51 @@ case class BRound(child: Expression, scale: 
Expression)
 with Serializable with ImplicitCastInputTypes {
   def this(child: Expression) = this(child, Literal(0))
 }
+
+/**
+ *  Returns the bucket number into which
+ *  the value of this expression would fall after being evaluated.
+ *
+ * @param expr is the expression for which the histogram is being created
+ * @param minValue is an expression that resolves
+ * to the minimum end point of the acceptable range for 
expr
+ * @param maxValue is an expression that resolves
+ * to the maximum end point of the acceptable range for 
expr
+ * @param numBucket is an An expression that resolves to
+ *  a constant indicating the number of buckets
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns an 
long between 0 and `num_buckets`+1 by mapping the `expr` into buckets defined 
by the range [`min_value`, `max_value`].",
+  extended = """
+Examples:
+  > SELECT _FUNC_(5.35, 0.024, 10.06, 5);
+   3
+  """)
+// scalastyle:on line.size.limit
+case class WidthBucket(
+  expr: Expression,
+  minValue: Expression,
+  maxValue: Expression,
+  numBucket: Expression) extends QuaternaryExpression with 
ImplicitCastInputTypes {
+
+  override def children: Seq[Expression] = Seq(expr, minValue, maxValue, 
numBucket)
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType, DoubleType, LongType)
+  override def dataType: DataType = LongType
+  override def nullable: Boolean = true
+
+  override def nullSafeEval(ex: Any, min: Any, max: Any, num: Any): Any = {
+MathUtils.widthBucket(
+  ex.asInstanceOf[Double],
--- End diff --

foldable expression is correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-15 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r127595777
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1186,3 +1186,51 @@ case class BRound(child: Expression, scale: 
Expression)
 with Serializable with ImplicitCastInputTypes {
   def this(child: Expression) = this(child, Literal(0))
 }
+
+/**
+ *  Returns the bucket number into which
+ *  the value of this expression would fall after being evaluated.
+ *
+ * @param expr is the expression for which the histogram is being created
+ * @param minValue is an expression that resolves
+ * to the minimum end point of the acceptable range for 
expr
+ * @param maxValue is an expression that resolves
+ * to the maximum end point of the acceptable range for 
expr
+ * @param numBucket is an An expression that resolves to
+ *  a constant indicating the number of buckets
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr, min_value, max_value, num_bucket) - Returns an 
long between 0 and `num_buckets`+1 by mapping the `expr` into buckets defined 
by the range [`min_value`, `max_value`].",
+  extended = """
+Examples:
+  > SELECT _FUNC_(5.35, 0.024, 10.06, 5);
+   3
+  """)
+// scalastyle:on line.size.limit
+case class WidthBucket(
+  expr: Expression,
+  minValue: Expression,
+  maxValue: Expression,
+  numBucket: Expression) extends QuaternaryExpression with 
ImplicitCastInputTypes {
+
+  override def children: Seq[Expression] = Seq(expr, minValue, maxValue, 
numBucket)
+  override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, 
DoubleType, DoubleType, LongType)
+  override def dataType: DataType = LongType
+  override def nullable: Boolean = true
+
+  override def nullSafeEval(ex: Any, min: Any, max: Any, num: Any): Any = {
+MathUtils.widthBucket(
+  ex.asInstanceOf[Double],
--- End diff --

What happened if the input is not a constant, but an foldable expression?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-15 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r127595711
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1186,3 +1186,51 @@ case class BRound(child: Expression, scale: 
Expression)
 with Serializable with ImplicitCastInputTypes {
   def this(child: Expression) = this(child, Literal(0))
 }
+
+/**
+ *  Returns the bucket number into which
+ *  the value of this expression would fall after being evaluated.
+ *
+ * @param expr is the expression for which the histogram is being created
+ * @param minValue is an expression that resolves
+ * to the minimum end point of the acceptable range for 
expr
+ * @param maxValue is an expression that resolves
+ * to the maximum end point of the acceptable range for 
expr
+ * @param numBucket is an An expression that resolves to
--- End diff --

`an An` -> `an`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-26 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123941530
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/MathUtilsSuite.scala
 ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.util.MathUtils._
+
+class MathUtilsSuite extends SparkFunSuite {
+
+  test("widthBucket") {
+assert(widthBucket(5.35, 0.024, 10.06, 5) === 3)
+assert(widthBucket(0, 1, 1, 1) === 0)
+assert(widthBucket(20, 1, 1, 1) === 2)
+
+// Test 
https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2137.htm#OLADM717
+// WIDTH_BUCKET(credit_limit, 100, 5000, 10)
+assert(widthBucket(500, 100, 5000, 10) === 1)
+assert(widthBucket(2300, 100, 5000, 10) === 5)
+assert(widthBucket(3500, 100, 5000, 10) === 7)
+assert(widthBucket(1200, 100, 5000, 10) === 3)
+assert(widthBucket(1400, 100, 5000, 10) === 3)
+assert(widthBucket(700, 100, 5000, 10) === 2)
+assert(widthBucket(5000, 100, 5000, 10) === 11)
+assert(widthBucket(1800, 100, 5000, 10) === 4)
+assert(widthBucket(400, 100, 5000, 10) === 1)
+
+// minValue == maxValue
+assert(widthBucket(10, 4, 4, 15) === 16)
--- End diff --

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-26 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123937970
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/MathUtilsSuite.scala
 ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.util.MathUtils._
+
+class MathUtilsSuite extends SparkFunSuite {
+
+  test("widthBucket") {
+assert(widthBucket(5.35, 0.024, 10.06, 5) === 3)
+assert(widthBucket(0, 1, 1, 1) === 0)
+assert(widthBucket(20, 1, 1, 1) === 2)
+
+// Test 
https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2137.htm#OLADM717
+// WIDTH_BUCKET(credit_limit, 100, 5000, 10)
+assert(widthBucket(500, 100, 5000, 10) === 1)
+assert(widthBucket(2300, 100, 5000, 10) === 5)
+assert(widthBucket(3500, 100, 5000, 10) === 7)
+assert(widthBucket(1200, 100, 5000, 10) === 3)
+assert(widthBucket(1400, 100, 5000, 10) === 3)
+assert(widthBucket(700, 100, 5000, 10) === 2)
+assert(widthBucket(5000, 100, 5000, 10) === 11)
+assert(widthBucket(1800, 100, 5000, 10) === 4)
+assert(widthBucket(400, 100, 5000, 10) === 1)
+
+// minValue == maxValue
+assert(widthBucket(10, 4, 4, 15) === 16)
--- End diff --

Yes, This is Oracle's behavior:

![oracle-width_bucket](https://user-images.githubusercontent.com/5399861/27527840-235cc452-5a80-11e7-99ff-90c0f10e2eb7.gif)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-26 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123936383
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/MathUtilsSuite.scala
 ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.util.MathUtils._
+
+class MathUtilsSuite extends SparkFunSuite {
+
+  test("widthBucket") {
+assert(widthBucket(5.35, 0.024, 10.06, 5) === 3)
+assert(widthBucket(0, 1, 1, 1) === 0)
+assert(widthBucket(20, 1, 1, 1) === 2)
+
+// Test 
https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2137.htm#OLADM717
+// WIDTH_BUCKET(credit_limit, 100, 5000, 10)
+assert(widthBucket(500, 100, 5000, 10) === 1)
+assert(widthBucket(2300, 100, 5000, 10) === 5)
+assert(widthBucket(3500, 100, 5000, 10) === 7)
+assert(widthBucket(1200, 100, 5000, 10) === 3)
+assert(widthBucket(1400, 100, 5000, 10) === 3)
+assert(widthBucket(700, 100, 5000, 10) === 2)
+assert(widthBucket(5000, 100, 5000, 10) === 11)
+assert(widthBucket(1800, 100, 5000, 10) === 4)
+assert(widthBucket(400, 100, 5000, 10) === 1)
+
+// minValue == maxValue
+assert(widthBucket(10, 4, 4, 15) === 16)
--- End diff --

Is this behavior consistent with other databases? If so, then I'm fine with 
this and please ignore my previous comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-26 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123934820
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr id the expression for which the histogram is being created
+   * @param minValue is an expression that resolves
+   * to the minimum end point of the acceptable range for 
expr
+   * @param maxValue is an expression that resolves
+   * to the maximum end point of the acceptable range for 
expr
+   * @param numBucket is an An expression that resolves to
+   *  a constant indicating the number of buckets
+   * @return Returns an long between 0 and numBucket+1 by mapping the expr 
into buckets defined by
+   * the range [minValue, maxValue]. For example:
+   * widthBucket(0, 1, 1, 1) -> 0, widthBucket(20, 1, 1, 1) -> 2.
+   */
+  def widthBucket(expr: Double, minValue: Double, maxValue: Double, 
numBucket: Long): Long = {
+
+if (numBucket <= 0) {
+  throw new AnalysisException(s"The num of bucket must be greater than 
0, but got ${numBucket}")
+}
--- End diff --

I know, but my question is: should the possible result when minValue == 
maxValue only be 0, 1, or 2? e.g. should `width_bucket(2, 1, 1, 100) = 101 or 
just throw an error for invalid input?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-25 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123930006
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr id the expression for which the histogram is being created
+   * @param minValue is an expression that resolves
+   * to the minimum end point of the acceptable range for 
expr
+   * @param maxValue is an expression that resolves
+   * to the maximum end point of the acceptable range for 
expr
+   * @param numBucket is an An expression that resolves to
+   *  a constant indicating the number of buckets
+   * @return Returns an long between 0 and numBucket+1 by mapping the expr 
into buckets defined by
+   * the range [minValue, maxValue]. For example:
+   * widthBucket(0, 1, 1, 1) -> 0, widthBucket(20, 1, 1, 1) -> 2.
+   */
+  def widthBucket(expr: Double, minValue: Double, maxValue: Double, 
numBucket: Long): Long = {
+
+if (numBucket <= 0) {
+  throw new AnalysisException(s"The num of bucket must be greater than 
0, but got ${numBucket}")
+}
--- End diff --

If `minValue == maxValue `,  then `lower==upper`,  result is `numBucket + 
1L`:
```
val lower: Double = Math.min(minValue, maxValue)
val upper: Double = Math.max(minValue, maxValue)

val result: Long = if (expr < lower) {
  0
} else if (expr >= upper) {
  numBucket + 1L
} else {
  (numBucket.toDouble * (expr - lower) / (upper - lower) + 1).toLong
}

if (minValue > maxValue) (numBucket - result) + 1 else result
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-25 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123918251
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql ---
@@ -92,3 +92,8 @@ select abs(-3.13), abs('-2.19');
 
 -- positive/negative
 select positive('-1.11'), positive(-1.11), negative('-1.11'), 
negative(-1.11);
+
+-- width_bucket
+select width_bucket(5.35, 0.024, 10.06, 5);
+select width_bucket(5.35, 0.024, 10.06, -5);
--- End diff --

add a case for wrong input type: `select width_bucket(5.35, 0.024, 10.06, 
0.5);`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-25 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123918240
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr id the expression for which the histogram is being created
--- End diff --

nit: id -> is


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-25 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123919502
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr id the expression for which the histogram is being created
+   * @param minValue is an expression that resolves
+   * to the minimum end point of the acceptable range for 
expr
+   * @param maxValue is an expression that resolves
+   * to the maximum end point of the acceptable range for 
expr
+   * @param numBucket is an An expression that resolves to
+   *  a constant indicating the number of buckets
+   * @return Returns an long between 0 and numBucket+1 by mapping the expr 
into buckets defined by
+   * the range [minValue, maxValue]. For example:
+   * widthBucket(0, 1, 1, 1) -> 0, widthBucket(20, 1, 1, 1) -> 2.
--- End diff --

Let's remove these examples in the description, they are just corner cases. 
My previous comment was just to make sure both ends should be included.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-25 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123919350
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr id the expression for which the histogram is being created
+   * @param minValue is an expression that resolves
+   * to the minimum end point of the acceptable range for 
expr
+   * @param maxValue is an expression that resolves
+   * to the maximum end point of the acceptable range for 
expr
+   * @param numBucket is an An expression that resolves to
+   *  a constant indicating the number of buckets
+   * @return Returns an long between 0 and numBucket+1 by mapping the expr 
into buckets defined by
+   * the range [minValue, maxValue]. For example:
+   * widthBucket(0, 1, 1, 1) -> 0, widthBucket(20, 1, 1, 1) -> 2.
+   */
+  def widthBucket(expr: Double, minValue: Double, maxValue: Double, 
numBucket: Long): Long = {
+
+if (numBucket <= 0) {
+  throw new AnalysisException(s"The num of bucket must be greater than 
0, but got ${numBucket}")
+}
--- End diff --

Do we consider minValue == maxValue and numBucket > 1 valid input or not?
Please also add a test case for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-23 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123677893
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr id the expression for which the histogram is being created
+   * @param minValue is an expression that resolves
+   * to the minimum end point of the acceptable range for 
expr
+   * @param maxValue is an expression that resolves
+   * to the maximum end point of the acceptable range for 
expr
+   * @param numBucket is an An expression that resolves to
+   *  a constant indicating the number of buckets
+   * @return Returns an long between 0 and numBucket+1 by mapping the expr 
into buckets defined by
+   * the range [minValue, maxValue]
--- End diff --

Yes,  test on Oracle:
```
width_Bucket(0, 1, 1, 1) -> 0, width_Bucket(20, 1, 1, 1) -> 2
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-22 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123677056
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr id the expression for which the histogram is being created
+   * @param minValue is an expression that resolves
+   * to the minimum end point of the acceptable range for 
expr
+   * @param maxValue is an expression that resolves
+   * to the maximum end point of the acceptable range for 
expr
+   * @param numBucket is an An expression that resolves to
+   *  a constant indicating the number of buckets
+   * @return Returns an long between 0 and numBucket+1 by mapping the expr 
into buckets defined by
+   * the range [minValue, maxValue]
+   */
+  def widthBucket(expr: Double, minValue: Double, maxValue: Double, 
numBucket: Long): Long = {
+
+if (numBucket <= 0) {
+  throw new AnalysisException(s"The num of bucket must be greater than 
0, but got ${numBucket}")
+}
+
+val lower: Double = Math.min(minValue, maxValue)
+val upper: Double = Math.max(minValue, maxValue)
--- End diff --

Yes, oracle support it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-21 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123408655
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1186,3 +1186,51 @@ case class BRound(child: Expression, scale: 
Expression)
 with Serializable with ImplicitCastInputTypes {
   def this(child: Expression) = this(child, Literal(0))
 }
+
+/**
+ *  The bucket number into which
--- End diff --

nit: Returns the bucket number


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-21 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123409892
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql ---
@@ -92,3 +92,8 @@ select abs(-3.13), abs('-2.19');
 
 -- positive/negative
 select positive('-1.11'), positive(-1.11), negative('-1.11'), 
negative(-1.11);
+
+-- width_bucket
--- End diff --

Instead, we need to add a test for sql queries using this function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-21 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123401825
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -631,3 +631,109 @@ abstract class TernaryExpression extends Expression {
 }
   }
 }
+
+/**
+ * An expression with four inputs and one output. The output is by default 
evaluated to null
+ * if any input is evaluated to null.
+ */
+abstract class QuaternaryExpression extends Expression {
+
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  override def nullable: Boolean = children.exists(_.nullable)
+
+  /**
+   * Default behavior of evaluation according to the default nullability 
of TernaryExpression.
+   * If subclass of TernaryExpression override nullable, probably should 
also override this.
+   */
+  override def eval(input: InternalRow): Any = {
+val exprs = children
+val value1 = exprs(0).eval(input)
+if (value1 != null) {
+  val value2 = exprs(1).eval(input)
+  if (value2 != null) {
+val value3 = exprs(2).eval(input)
+if (value3 != null) {
+  val value4 = exprs(3).eval(input)
+  if (value4 != null) {
+return nullSafeEval(value1, value2, value3, value4)
+  }
+}
+  }
+}
+null
+  }
+
+  /**
+   * Called by default [[eval]] implementation.  If subclass of 
TernaryExpression keep the default
+   * nullability, they can override this method to save null-check code.  
If we need full control
+   * of evaluation process, we should override [[eval]].
+   */
+  protected def nullSafeEval(input1: Any, input2: Any, input3: Any, 
input4: Any): Any =
+sys.error(s"TernaryExpressions must override either eval or 
nullSafeEval")
+
+  /**
+   * Short hand for generating ternary evaluation code.
+   * If either of the sub-expressions is null, the result of this 
computation
+   * is assumed to be null.
+   *
+   * @param f accepts three variable names and returns Java code to 
compute the output.
+   */
+  protected def defineCodeGen(
+ctx: CodegenContext,
+ev: ExprCode,
+f: (String, String, String, String) => String): ExprCode = {
+nullSafeCodeGen(ctx, ev, (eval1, eval2, eval3, eval4) => {
+  s"${ev.value} = ${f(eval1, eval2, eval3, eval3)};"
--- End diff --

the last `eval3` -> `eval4`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-21 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123410671
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr id the expression for which the histogram is being created
+   * @param minValue is an expression that resolves
+   * to the minimum end point of the acceptable range for 
expr
+   * @param maxValue is an expression that resolves
+   * to the maximum end point of the acceptable range for 
expr
+   * @param numBucket is an An expression that resolves to
+   *  a constant indicating the number of buckets
+   * @return Returns an long between 0 and numBucket+1 by mapping the expr 
into buckets defined by
+   * the range [minValue, maxValue]
+   */
+  def widthBucket(expr: Double, minValue: Double, maxValue: Double, 
numBucket: Long): Long = {
+
+if (numBucket <= 0) {
+  throw new AnalysisException(s"The num of bucket must be greater than 
0, but got ${numBucket}")
+}
+
+val lower: Double = Math.min(minValue, maxValue)
+val upper: Double = Math.max(minValue, maxValue)
+
+val preResult: Long = if (expr < lower) {
+  0
+} else if (expr >= upper) {
+  Math.addExact(numBucket, 1)
+} else {
+  (numBucket.toDouble * (expr - lower) / (upper - lower) + 1).toLong
+}
+
+val result = if (minValue > maxValue) (numBucket - preResult) + 1 else 
preResult
+result
--- End diff --

nit: `if (minValue > maxValue) (numBucket - preResult) + 1 else preResult`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-21 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123411306
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr id the expression for which the histogram is being created
+   * @param minValue is an expression that resolves
+   * to the minimum end point of the acceptable range for 
expr
+   * @param maxValue is an expression that resolves
+   * to the maximum end point of the acceptable range for 
expr
+   * @param numBucket is an An expression that resolves to
+   *  a constant indicating the number of buckets
+   * @return Returns an long between 0 and numBucket+1 by mapping the expr 
into buckets defined by
+   * the range [minValue, maxValue]
+   */
+  def widthBucket(expr: Double, minValue: Double, maxValue: Double, 
numBucket: Long): Long = {
+
+if (numBucket <= 0) {
+  throw new AnalysisException(s"The num of bucket must be greater than 
0, but got ${numBucket}")
+}
+
+val lower: Double = Math.min(minValue, maxValue)
+val upper: Double = Math.max(minValue, maxValue)
+
+val preResult: Long = if (expr < lower) {
+  0
+} else if (expr >= upper) {
+  Math.addExact(numBucket, 1)
+} else {
+  (numBucket.toDouble * (expr - lower) / (upper - lower) + 1).toLong
--- End diff --

what if upper == lower?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-21 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123411090
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr id the expression for which the histogram is being created
+   * @param minValue is an expression that resolves
+   * to the minimum end point of the acceptable range for 
expr
+   * @param maxValue is an expression that resolves
+   * to the maximum end point of the acceptable range for 
expr
+   * @param numBucket is an An expression that resolves to
+   *  a constant indicating the number of buckets
+   * @return Returns an long between 0 and numBucket+1 by mapping the expr 
into buckets defined by
+   * the range [minValue, maxValue]
+   */
+  def widthBucket(expr: Double, minValue: Double, maxValue: Double, 
numBucket: Long): Long = {
+
+if (numBucket <= 0) {
+  throw new AnalysisException(s"The num of bucket must be greater than 
0, but got ${numBucket}")
+}
+
+val lower: Double = Math.min(minValue, maxValue)
+val upper: Double = Math.max(minValue, maxValue)
+
+val preResult: Long = if (expr < lower) {
+  0
+} else if (expr >= upper) {
+  Math.addExact(numBucket, 1)
--- End diff --

Do we really need to use `addExact`? In Oracle's doc, `numBucket` is an 
integer, then we can use `numBucket + 1L` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-21 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123410649
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr id the expression for which the histogram is being created
+   * @param minValue is an expression that resolves
+   * to the minimum end point of the acceptable range for 
expr
+   * @param maxValue is an expression that resolves
+   * to the maximum end point of the acceptable range for 
expr
+   * @param numBucket is an An expression that resolves to
+   *  a constant indicating the number of buckets
+   * @return Returns an long between 0 and numBucket+1 by mapping the expr 
into buckets defined by
+   * the range [minValue, maxValue]
--- End diff --

Both endpoints are included? Can you check with other databases and add a 
comment here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-21 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123402665
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -631,3 +631,109 @@ abstract class TernaryExpression extends Expression {
 }
   }
 }
+
+/**
+ * An expression with four inputs and one output. The output is by default 
evaluated to null
+ * if any input is evaluated to null.
+ */
+abstract class QuaternaryExpression extends Expression {
+
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  override def nullable: Boolean = children.exists(_.nullable)
+
+  /**
+   * Default behavior of evaluation according to the default nullability 
of TernaryExpression.
+   * If subclass of TernaryExpression override nullable, probably should 
also override this.
+   */
+  override def eval(input: InternalRow): Any = {
+val exprs = children
+val value1 = exprs(0).eval(input)
+if (value1 != null) {
+  val value2 = exprs(1).eval(input)
+  if (value2 != null) {
+val value3 = exprs(2).eval(input)
+if (value3 != null) {
+  val value4 = exprs(3).eval(input)
+  if (value4 != null) {
+return nullSafeEval(value1, value2, value3, value4)
+  }
+}
+  }
+}
+null
+  }
+
+  /**
+   * Called by default [[eval]] implementation.  If subclass of 
TernaryExpression keep the default
+   * nullability, they can override this method to save null-check code.  
If we need full control
+   * of evaluation process, we should override [[eval]].
+   */
+  protected def nullSafeEval(input1: Any, input2: Any, input3: Any, 
input4: Any): Any =
+sys.error(s"TernaryExpressions must override either eval or 
nullSafeEval")
+
+  /**
+   * Short hand for generating ternary evaluation code.
+   * If either of the sub-expressions is null, the result of this 
computation
+   * is assumed to be null.
+   *
+   * @param f accepts three variable names and returns Java code to 
compute the output.
+   */
+  protected def defineCodeGen(
+ctx: CodegenContext,
+ev: ExprCode,
+f: (String, String, String, String) => String): ExprCode = {
+nullSafeCodeGen(ctx, ev, (eval1, eval2, eval3, eval4) => {
+  s"${ev.value} = ${f(eval1, eval2, eval3, eval3)};"
+})
+  }
+
+  /**
+   * Short hand for generating ternary evaluation code.
+   * If either of the sub-expressions is null, the result of this 
computation
+   * is assumed to be null.
+   *
+   * @param f function that accepts the 3 non-null evaluation result names 
of children
--- End diff --

3 -> 4


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-21 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123408856
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr id the expression for which the histogram is being created
--- End diff --

id -> is


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-21 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123409716
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql ---
@@ -92,3 +92,8 @@ select abs(-3.13), abs('-2.19');
 
 -- positive/negative
 select positive('-1.11'), positive(-1.11), negative('-1.11'), 
negative(-1.11);
+
+-- width_bucket
--- End diff --

Do we need end-to-end tests here? I think we already cover these cases in 
other test suites.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-21 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123410449
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/MathUtils.scala 
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.sql.AnalysisException
+
+object MathUtils {
+
+  /**
+   *  Returns the bucket number into which
+   *  the value of this expression would fall after being evaluated.
+   *
+   * @param expr id the expression for which the histogram is being created
+   * @param minValue is an expression that resolves
+   * to the minimum end point of the acceptable range for 
expr
+   * @param maxValue is an expression that resolves
+   * to the maximum end point of the acceptable range for 
expr
+   * @param numBucket is an An expression that resolves to
+   *  a constant indicating the number of buckets
+   * @return Returns an long between 0 and numBucket+1 by mapping the expr 
into buckets defined by
+   * the range [minValue, maxValue]
+   */
+  def widthBucket(expr: Double, minValue: Double, maxValue: Double, 
numBucket: Long): Long = {
+
+if (numBucket <= 0) {
+  throw new AnalysisException(s"The num of bucket must be greater than 
0, but got ${numBucket}")
+}
+
+val lower: Double = Math.min(minValue, maxValue)
+val upper: Double = Math.max(minValue, maxValue)
--- End diff --

Does other databases allow max value to appear first? i.e. 
`widthBucket(3.14, 4, 0, 3, 1)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-21 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123412204
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/MathUtilsSuite.scala
 ---
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.util.MathUtils._
+
+class MathUtilsSuite extends SparkFunSuite {
+
+  test("widthBucket") {
+assert(widthBucket(5.35, 0.024, 10.06, 5) === 3)
+assert(widthBucket(99, 100, 5000, 10) === 0)
+assert(widthBucket(100, 100, 5000, 10) === 1)
+assert(widthBucket(590, 100, 5000, 10) === 2)
+assert(widthBucket(5000, 100, 5000, 10) === 11)
+assert(widthBucket(6000, 100, 5000, 10) === 11)
--- End diff --

Can you use the same cases from Oracle's doc? just to make sure we are 
getting the same results here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-21 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123409299
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala
 ---
@@ -644,4 +645,36 @@ class MathExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkEvaluation(BRound(-0.35, 1), -0.4)
 checkEvaluation(BRound(-35, -1), -40)
   }
+
+  test("width_bucket") {
+def test(
+  expr: Double,
+  minValue: Double,
+  maxValue: Double,
+  numBucket: Long,
+  expected: Long): Unit = {
+  checkEvaluation(WidthBucket(Literal.create(expr, DoubleType),
+Literal.create(minValue, DoubleType),
+Literal.create(maxValue, DoubleType),
+Literal.create(numBucket, LongType)),
+expected)
+}
+
+test(5.35, 0.024, 10.06, 5, 3)
+
+test(3.14, 0, 4, 3, 3)
+test(2, 0, 4, 3, 2)
+test(-1, 0, 3.2, 4, 0)
+
+test(3.14, 4, 0, 3, 1)
+test(2, 4, 0, 3, 2)
+test(-1, 3.2, 0, 4, 5)
+
+intercept[AnalysisException]{
--- End diff --

also add test cases for null input and wrong input type


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-21 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r123401878
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
 ---
@@ -631,3 +631,109 @@ abstract class TernaryExpression extends Expression {
 }
   }
 }
+
+/**
+ * An expression with four inputs and one output. The output is by default 
evaluated to null
+ * if any input is evaluated to null.
+ */
+abstract class QuaternaryExpression extends Expression {
+
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  override def nullable: Boolean = children.exists(_.nullable)
+
+  /**
+   * Default behavior of evaluation according to the default nullability 
of TernaryExpression.
--- End diff --

`TernaryExpression` -> `QuaternaryExpression`,  please also fix other places


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-06-16 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/18323

[SPARK-21117][SQL] Built-in SQL Function Support - WIDTH_BUCKET

## What changes were proposed in this pull request?

Add build-in SQL function - `WIDTH_BUCKET`

Ref: 
https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2137.htm#OLADM717

## How was this patch tested?

unit tests and manual tests.

manual tests(compare `spark-sql` results with `oracle`):
```sql
select width_bucket(5.35, 0.024, 10.06, 5) from dual; -- 3 --

select width_bucket(3.14, 0, 4, 3) from dual; -- 3 --
select width_bucket(2, 0, 4, 3) from dual;-- 2 --
select width_bucket(-1, 0, 3.2, 4) from dual; -- 0 -- 

select width_bucket(3.14, 4, 0, 3) from dual; -- 1 --
select width_bucket(2, 4, 0, 3) from dual;-- 2 --
select width_bucket(-1, 3.2, 0, 4) from dual; -- 5 --
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-21117

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18323.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18323


commit 20fe5679735380b931fb3eab6841c1da376c1742
Author: Yuming Wang 
Date:   2017-06-16T06:28:18Z

Built-in SQL Function Support - WIDTH_BUCKET




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org