[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user NihalHarish closed the pull request at: https://github.com/apache/spark/pull/21482 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user NihalHarish commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r198039290 --- Diff: python/pyspark/sql/column.py --- @@ -514,6 +514,17 @@ def isin(self, *cols): desc_nulls_first = ignore_unicode_prefix(_unary_op("desc_nulls_first", _desc_nulls_first_doc)) desc_nulls_last = ignore_unicode_prefix(_unary_op("desc_nulls_last", _desc_nulls_last_doc)) +_isInf_doc = """ +True if the current expression is inf. + +>>> from pyspark.sql import Row +>>> df = spark.createDataFrame([ +Row(name=u'Tom', height=80.0), +Row(name=u'Alice', height=float('inf')) --- End diff -- Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r197989956 --- Diff: python/pyspark/sql/column.py --- @@ -514,6 +514,17 @@ def isin(self, *cols): desc_nulls_first = ignore_unicode_prefix(_unary_op("desc_nulls_first", _desc_nulls_first_doc)) desc_nulls_last = ignore_unicode_prefix(_unary_op("desc_nulls_last", _desc_nulls_last_doc)) +_isInf_doc = """ +True if the current expression is inf. + +>>> from pyspark.sql import Row +>>> df = spark.createDataFrame([ +Row(name=u'Tom', height=80.0), +Row(name=u'Alice', height=float('inf')) --- End diff -- dots are required as written in https://github.com/apache/spark/pull/21482#discussion_r197626112 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r197626112 --- Diff: python/pyspark/sql/column.py --- @@ -514,6 +514,16 @@ def isin(self, *cols): desc_nulls_first = ignore_unicode_prefix(_unary_op("desc_nulls_first", _desc_nulls_first_doc)) desc_nulls_last = ignore_unicode_prefix(_unary_op("desc_nulls_last", _desc_nulls_last_doc)) +_isInf_doc = """ +True if the current expression is inf. + +>>> from pyspark.sql import Row +>>> df = spark.createDataFrame([\ +Row(name=u'Tom', height=80.0),\ +Row(name=u'Alice', height=float('inf'))]) --- End diff -- nit: ``` >>> df = spark.createDataFrame([ ... Row(name=u'Tom', height=80.0), ... Row(name=u'Alice', height=float('inf')) ... ]) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user NihalHarish commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r197538274 --- Diff: python/pyspark/sql/functions.py --- @@ -468,6 +468,18 @@ def input_file_name(): return Column(sc._jvm.functions.input_file_name()) +@since(2.4) +def isinf(col): --- End diff -- Added it in my latest commit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r197340906 --- Diff: python/pyspark/sql/functions.py --- @@ -468,6 +468,18 @@ def input_file_name(): return Column(sc._jvm.functions.input_file_name()) +@since(2.4) +def isinf(col): --- End diff -- Yes, please because I see it's exposed in Column.scala. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r197234189 --- Diff: python/pyspark/sql/functions.py --- @@ -468,6 +468,18 @@ def input_file_name(): return Column(sc._jvm.functions.input_file_name()) +@since(2.4) +def isinf(col): --- End diff -- @HyukjinKwon could you clarify, please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user NihalHarish commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r194827936 --- Diff: python/pyspark/sql/functions.py --- @@ -468,6 +468,18 @@ def input_file_name(): return Column(sc._jvm.functions.input_file_name()) +@since(2.4) +def isinf(col): --- End diff -- Do you want me to add the function to column.py? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r194611249 --- Diff: python/pyspark/sql/functions.py --- @@ -468,6 +468,18 @@ def input_file_name(): return Column(sc._jvm.functions.input_file_name()) +@since(2.4) +def isinf(col): --- End diff -- Shall we expose this to column.py too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r194610745 --- Diff: R/pkg/NAMESPACE --- @@ -281,6 +281,8 @@ exportMethods("%<=>%", "initcap", "input_file_name", "instr", + "isInf", + "isinf", --- End diff -- Ah, I got it now. I believe we should match it to one side though. I roughly remember we keep functions this_naming_style in functions[.py|.R|.scala], e.g.([SPARK-10621](https://issues.apache.org/jira/browse/SPARK-10621)). Shall we stick to `isinf` then? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r194170615 --- Diff: R/pkg/NAMESPACE --- @@ -281,6 +281,8 @@ exportMethods("%<=>%", "initcap", "input_file_name", "instr", + "isInf", + "isinf", --- End diff -- I commented on this here https://github.com/apache/spark/pull/21482#discussion_r192577235 it's because Column capitalizes it differently from functions --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user NihalHarish commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r194119670 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NullExpressionsSuite.scala --- @@ -56,6 +56,16 @@ class NullExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { assert(ex.contains("Null value appeared in non-nullable field")) } + test("IsInf") { +checkEvaluation(IsInf(Literal(Double.PositiveInfinity)), true) +checkEvaluation(IsInf(Literal(Double.NegativeInfinity)), true) +checkEvaluation(IsInf(Literal(Float.PositiveInfinity)), true) +checkEvaluation(IsInf(Literal(Float.NegativeInfinity)), true) +checkEvaluation(IsInf(Literal.create(null, DoubleType)), false) +checkEvaluation(IsInf(Literal(Float.MaxValue)), false) +checkEvaluation(IsInf(Literal(5.5f)), false) --- End diff -- Added the checks in my later commits --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r193941785 --- Diff: R/pkg/NAMESPACE --- @@ -281,6 +281,8 @@ exportMethods("%<=>%", "initcap", "input_file_name", "instr", + "isInf", + "isinf", --- End diff -- I think we shouldn't add other variants unless there's a clear reason. It sounds like we are adding this for no reason. > how about we just go with isInf for now and if other aliases are needed in the future they can be added and discussed then? Yea, please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user NihalHarish commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r193902999 --- Diff: R/pkg/NAMESPACE --- @@ -281,6 +281,8 @@ exportMethods("%<=>%", "initcap", "input_file_name", "instr", + "isInf", + "isinf", --- End diff -- Added tests in my latest commit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r193898812 --- Diff: R/pkg/NAMESPACE --- @@ -281,6 +281,8 @@ exportMethods("%<=>%", "initcap", "input_file_name", "instr", + "isInf", + "isinf", --- End diff -- Do you know if there's any consistency behind the different capitalization schemes? There's `format_number`, `isnan` and `isNotNull` here, for example. If not, how about we just go with `isInf` for now and if other aliases are needed in the future they can be added and discussed then? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r193630041 --- Diff: R/pkg/NAMESPACE --- @@ -281,6 +281,8 @@ exportMethods("%<=>%", "initcap", "input_file_name", "instr", + "isInf", + "isinf", --- End diff -- I really don't understand why we have both. It usually has the ones matching to Scala side or R specific function. Otherwise, I don't think we should have both. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r193230476 --- Diff: R/pkg/NAMESPACE --- @@ -281,6 +281,8 @@ exportMethods("%<=>%", "initcap", "input_file_name", "instr", + "isInf", + "isinf", --- End diff -- the functions are case insensitive so i don't think we need both? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192931084 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala --- @@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, expr3: Expression, child: override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, ${expr3.sql})" } +/** + * Evaluates to `true` iff it's Infinity. + */ +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns true if expr evaluates to infinite else returns false ", + examples = """ +Examples: + > SELECT _FUNC_(1/0); + True + > SELECT _FUNC_(5); + False + """) +case class IsInf(child: Expression) extends UnaryExpression + with Predicate with ImplicitCastInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(DoubleType, FloatType)) + + override def nullable: Boolean = false + + override def eval(input: InternalRow): Boolean = { +val value = child.eval(input) +if (value == null) { + false +} else { + child.dataType match { +case DoubleType => value.asInstanceOf[Double].isInfinity +case FloatType => value.asInstanceOf[Float].isInfinity + } +} + } + + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val eval = child.genCode(ctx) +child.dataType match { + case DoubleType | FloatType => --- End diff -- I think we will only see double and float types here because of `inputTypes`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user NihalHarish commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192818734 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala --- @@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, expr3: Expression, child: override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, ${expr3.sql})" } +/** + * Evaluates to `true` iff it's Infinity. + */ +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns true if expr evaluates to infinite else returns false ", + examples = """ +Examples: + > SELECT _FUNC_(1/0); + True + > SELECT _FUNC_(5); + False + """) +case class IsInf(child: Expression) extends UnaryExpression + with Predicate with ImplicitCastInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(DoubleType, FloatType)) + + override def nullable: Boolean = false + + override def eval(input: InternalRow): Boolean = { +val value = child.eval(input) +if (value == null) { + false +} else { + child.dataType match { +case DoubleType => value.asInstanceOf[Double].isInfinity +case FloatType => value.asInstanceOf[Float].isInfinity + } +} + } + + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val eval = child.genCode(ctx) +child.dataType match { + case DoubleType | FloatType => --- End diff -- The function can only test for infinity values for datatypes Double and Float, and hence we need to match the child datatype with these types --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192577235 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1107,6 +1107,14 @@ object functions { */ def input_file_name(): Column = withExpr { InputFileName() } + /** + * Return true iff the column is Infinity. + * + * @group normal_funcs + * @since 2.4.0 + */ + def isinf(e: Column): Column = withExpr { IsInf(e.expr) } --- End diff -- I guess someone should elaborate if Column.isFoo vs function's isfoo is the right pattern we want to stay with... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192577261 --- Diff: R/pkg/NAMESPACE --- @@ -281,6 +281,8 @@ exportMethods("%<=>%", "initcap", "input_file_name", "instr", + "isInf", + "isinf", --- End diff -- add tests for these? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192577245 --- Diff: R/pkg/R/generics.R --- @@ -695,6 +695,12 @@ setGeneric("getField", function(x, ...) { standardGeneric("getField") }) #' @rdname columnfunctions setGeneric("getItem", function(x, ...) { standardGeneric("getItem") }) +#' @rdname columnfunctions +setGeneric("isInf", function(x) { standardGeneric("isInf") }) + +#' @rdname columnfunctions +setGeneric("isinf", function(x) { standardGeneric("isinf") }) --- End diff -- `isnan` lower case is not a column functions see https://github.com/NihalHarish/spark/blob/7e396f70f58ffd309e7f738751f3aa8cfe321ce7/R/pkg/R/generics.R#L1002 `@rdname columnfunctions` will cause it to go to the wrong doc page --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192577270 --- Diff: R/pkg/R/functions.R --- @@ -907,6 +907,17 @@ setMethod("initcap", column(jc) }) +#' @details +#' \code{isinf}: Returns true if the column is Infinity. +#' @rdname column_nonaggregate_functions +#' @note isinf since 2.4.0 --- End diff -- missing `@aliases` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user NihalHarish commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192575235 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1107,6 +1107,14 @@ object functions { */ def input_file_name(): Column = withExpr { InputFileName() } + /** + * Return true iff the column is Infinity. + * + * @group normal_funcs + * @since 2.4.0 + */ + def isinf(e: Column): Column = withExpr { IsInf(e.expr) } --- End diff -- I have followed what seemed to be the preexistent convention for function names in those particular files. For example in `functions.scala` all function names were in lower case but in `Column.scala` all function names were in camel case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192574664 --- Diff: R/pkg/R/functions.R --- @@ -907,6 +907,30 @@ setMethod("initcap", column(jc) }) +#' @details +#' \code{isinf}: Returns true if the column is Infinity. +#' @rdname column_nonaggregate_functions +#' @aliases isnan isnan,Column-method +#' @note isinf since 2.4.0 +setMethod("isinf", --- End diff -- For `isNaN` case, `functions.R` only defines `isnan`. `isNaN` is defined in `column.R`. If we really want to have both `isInf` and `isinf` like `isNaN` and `isnan`, maybe to follow it instead having two duplicate method definition here is better. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192574620 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala --- @@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, expr3: Expression, child: override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, ${expr3.sql})" } +/** + * Evaluates to `true` iff it's Infinity. + */ +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns true if expr evaluates to infinite else returns false ", + examples = """ +Examples: + > SELECT _FUNC_(1/0); + True + > SELECT _FUNC_(5); + False + """) +case class IsInf(child: Expression) extends UnaryExpression + with Predicate with ImplicitCastInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(DoubleType, FloatType)) + + override def nullable: Boolean = false + + override def eval(input: InternalRow): Boolean = { +val value = child.eval(input) +if (value == null) { + false +} else { + child.dataType match { +case DoubleType => value.asInstanceOf[Double].isInfinity +case FloatType => value.asInstanceOf[Float].isInfinity + } +} + } + + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val eval = child.genCode(ctx) +child.dataType match { + case DoubleType | FloatType => --- End diff -- Is this match block necessary since there is only case pattern? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user NihalHarish commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192574603 --- Diff: R/pkg/R/functions.R --- @@ -907,6 +907,30 @@ setMethod("initcap", column(jc) }) +#' @details +#' \code{isinf}: Returns true if the column is Infinity. +#' @rdname column_nonaggregate_functions +#' @aliases isnan isnan,Column-method +#' @note isinf since 2.4.0 +setMethod("isinf", --- End diff -- Would it be alright if I omit `isInf` completely and only implement `isinf`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192574582 --- Diff: R/pkg/R/functions.R --- @@ -907,6 +907,30 @@ setMethod("initcap", column(jc) }) +#' @details +#' \code{isinf}: Returns true if the column is Infinity. +#' @rdname column_nonaggregate_functions +#' @aliases isnan isnan,Column-method --- End diff -- `@aliases` is incorrect here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192574568 --- Diff: R/pkg/R/functions.R --- @@ -907,6 +907,30 @@ setMethod("initcap", column(jc) }) +#' @details +#' \code{isinf}: Returns true if the column is Infinity. +#' @rdname column_nonaggregate_functions +#' @aliases isnan isnan,Column-method +#' @note isinf since 2.4.0 +setMethod("isinf", --- End diff -- We don't need to have duplicate method definition like this. Maybe we can follow `isNaN`'s approach if we really need to have both `isinf` and `isInf`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user NihalHarish commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192574396 --- Diff: R/pkg/NAMESPACE --- @@ -281,6 +281,8 @@ exportMethods("%<=>%", "initcap", "input_file_name", "instr", + "isInf", + "isinf", --- End diff -- I have just followed what has been done for `isnan`, which also has `isNan` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192574389 --- Diff: R/pkg/NAMESPACE --- @@ -281,6 +281,8 @@ exportMethods("%<=>%", "initcap", "input_file_name", "instr", + "isInf", + "isinf", --- End diff -- Yeah, I think we should not have both `isInf` and `isinf`. Since we have `isNaN`, `isNull`..., maybe `isInf` is more consistent. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192574339 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala --- @@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, expr3: Expression, child: override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, ${expr3.sql})" } +/** + * Evaluates to `true` iff it's Infinity. + */ +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns true if expr evaluates to infinite else returns false ", + examples = """ +Examples: + > SELECT _FUNC_(1/0); + True + > SELECT _FUNC_(5); + False + """) +case class IsInf(child: Expression) extends UnaryExpression + with Predicate with ImplicitCastInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(DoubleType, FloatType)) + + override def nullable: Boolean = false + + override def eval(input: InternalRow): Boolean = { +val value = child.eval(input) +if (value == null) { + false +} else { + child.dataType match { +case DoubleType => value.asInstanceOf[Double].isInfinity +case FloatType => value.asInstanceOf[Float].isInfinity + } +} + } + + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val eval = child.genCode(ctx) +child.dataType match { + case DoubleType | FloatType => +ev.copy(code = code""" + ${eval.code} + ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)}; --- End diff -- We can assign `ev.value` directly here without a default value. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192551752 --- Diff: R/pkg/R/functions.R --- @@ -907,6 +907,30 @@ setMethod("initcap", column(jc) }) +#' @details +#' \code{isinf}: Returns true if the column is Infinity. +#' @rdname column_nonaggregate_functions +#' @aliases isnan isnan,Column-method +#' @note isinf since 2.4.0 +setMethod("isinf", + signature(x = "Column"), + function(x) { +jc <- callJStatic("org.apache.spark.sql.functions", "isinf", x@jc) +column(jc) + }) + +#' @details +#' \code{isInf}: Returns true if the column is Infinity. +#' @rdname column_nonaggregate_functions +#' @aliases isnan isnan,Column-method +#' @note isinf since 2.4.0 +setMethod("isInf", --- End diff -- I like the idea, but we might not have a way to extend it (sort of) ``` > showMethods("is.finite") Function: is.finite (package base) > is.finite function (x) .Primitive("is.finite") ``` It looks like S3 without a generic. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192551777 --- Diff: R/pkg/NAMESPACE --- @@ -281,6 +281,8 @@ exportMethods("%<=>%", "initcap", "input_file_name", "instr", + "isInf", + "isinf", --- End diff -- I really appreciate the attempt to include R, though a question, why do we have `isInf` and `isinf`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192549119 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala --- @@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, expr3: Expression, child: override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, ${expr3.sql})" } +/** + * Evaluates to `true` iff it's Infinity. + */ +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns True if expr evaluates to infinite else returns False ", --- End diff -- True -> true, False -> false to be consistent --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192549111 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala --- @@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, expr3: Expression, child: override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, ${expr3.sql})" } +/** + * Evaluates to `true` iff it's Infinity. + */ +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns True if expr evaluates to infinite else returns False ", + examples = """ +Examples: + > SELECT _FUNC_(1/0); + True --- End diff -- Can you run the example and check the results? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192547440 --- Diff: R/pkg/R/functions.R --- @@ -907,6 +907,30 @@ setMethod("initcap", column(jc) }) +#' @details +#' \code{isinf}: Returns true if the column is Infinity. +#' @rdname column_nonaggregate_functions +#' @aliases isnan isnan,Column-method +#' @note isinf since 2.4.0 +setMethod("isinf", + signature(x = "Column"), + function(x) { +jc <- callJStatic("org.apache.spark.sql.functions", "isinf", x@jc) +column(jc) + }) + +#' @details +#' \code{isInf}: Returns true if the column is Infinity. +#' @rdname column_nonaggregate_functions +#' @aliases isnan isnan,Column-method +#' @note isinf since 2.4.0 +setMethod("isInf", --- End diff -- R has `is.infinite`. Can we match the behaviour and rename it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192547376 --- Diff: R/pkg/R/functions.R --- @@ -907,6 +907,30 @@ setMethod("initcap", column(jc) }) +#' @details +#' \code{isinf}: Returns true if the column is Infinity. +#' @rdname column_nonaggregate_functions +#' @aliases isnan isnan,Column-method +#' @note isinf since 2.4.0 +setMethod("isinf", --- End diff -- R has `is.infinite`. Can we match the behaviour and rename it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192547364 --- Diff: python/pyspark/sql/functions.py --- @@ -468,6 +468,18 @@ def input_file_name(): return Column(sc._jvm.functions.input_file_name()) +@since(2.4) +def isinf(col): +"""An expression that returns true iff the column is NaN. --- End diff -- ditto. is this the same with `isnan`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192547355 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala --- @@ -557,6 +557,14 @@ class Column(val expr: Expression) extends Logging { (this >= lowerBound) && (this <= upperBound) } + /** + * True if the current expression is NaN. --- End diff -- ? is this the same with `isNaN`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192547274 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1107,6 +1107,14 @@ object functions { */ def input_file_name(): Column = withExpr { InputFileName() } + /** + * Return true iff the column is Infinity. + * + * @group normal_funcs + * @since 2.4.0 + */ + def isinf(e: Column): Column = withExpr { IsInf(e.expr) } --- End diff -- Mind if I ask to elaborate `isinf` vs `isInf` across the APIs? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user NihalHarish commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192541712 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala --- @@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, expr3: Expression, child: override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, ${expr3.sql})" } +/** + * Evaluates to `true` iff it's Infinity. + */ +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns True evaluates to infinite else returns False ", + examples = """ +Examples: + > SELECT _FUNC_(1/0); + True + > SELECT _FUNC_(5); + False + """) +case class IsInf(child: Expression) extends UnaryExpression + with Predicate with ImplicitCastInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(DoubleType, FloatType)) + + override def nullable: Boolean = false + + override def eval(input: InternalRow): Boolean = { +val value = child.eval(input) +if (value == null) { + false +} else { + child.dataType match { +case DoubleType => value.asInstanceOf[Double].isInfinity +case FloatType => value.asInstanceOf[Float].isInfinity + } +} + } + + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val eval = child.genCode(ctx) +child.dataType match { + case DoubleType | FloatType => +ev.copy(code = code""" + ${eval.code} + ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)}; + ${ev.value} = !${eval.isNull} && Double.isInfinite(${eval.value});""", --- End diff -- The non-codegen version uses the isInfinity method defined for scala's Double and Float, whereas the codegen version uses java's static method "isInfinite" defined for the classes Double and Float. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192522027 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala --- @@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, expr3: Expression, child: override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, ${expr3.sql})" } +/** + * Evaluates to `true` iff it's Infinity. + */ +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns True evaluates to infinite else returns False ", --- End diff -- "True evaluates" -> "True if expr evaluates" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192520713 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NullExpressionsSuite.scala --- @@ -56,6 +56,16 @@ class NullExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { assert(ex.contains("Null value appeared in non-nullable field")) } + test("IsInf") { +checkEvaluation(IsInf(Literal(Double.PositiveInfinity)), true) +checkEvaluation(IsInf(Literal(Double.NegativeInfinity)), true) +checkEvaluation(IsInf(Literal(Float.PositiveInfinity)), true) +checkEvaluation(IsInf(Literal(Float.NegativeInfinity)), true) +checkEvaluation(IsInf(Literal.create(null, DoubleType)), false) +checkEvaluation(IsInf(Literal(Float.MaxValue)), false) +checkEvaluation(IsInf(Literal(5.5f)), false) --- End diff -- check NaN as well? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192521881 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala --- @@ -199,6 +199,50 @@ case class Nvl2(expr1: Expression, expr2: Expression, expr3: Expression, child: override def sql: String = s"$prettyName(${expr1.sql}, ${expr2.sql}, ${expr3.sql})" } +/** + * Evaluates to `true` iff it's Infinity. + */ +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns True evaluates to infinite else returns False ", + examples = """ +Examples: + > SELECT _FUNC_(1/0); + True + > SELECT _FUNC_(5); + False + """) +case class IsInf(child: Expression) extends UnaryExpression + with Predicate with ImplicitCastInputTypes { + + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(DoubleType, FloatType)) + + override def nullable: Boolean = false + + override def eval(input: InternalRow): Boolean = { +val value = child.eval(input) +if (value == null) { + false +} else { + child.dataType match { +case DoubleType => value.asInstanceOf[Double].isInfinity +case FloatType => value.asInstanceOf[Float].isInfinity + } +} + } + + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val eval = child.genCode(ctx) +child.dataType match { + case DoubleType | FloatType => +ev.copy(code = code""" + ${eval.code} + ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)}; + ${ev.value} = !${eval.isNull} && Double.isInfinite(${eval.value});""", --- End diff -- out of interest, why use `Double.isInfinite` here, but `value.isInfinity` in the non-codegen version? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192520834 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1107,6 +1107,14 @@ object functions { */ def input_file_name(): Column = withExpr { InputFileName() } + /** + * Return true iff the column is Infinity. + * + * @group normal_funcs + * @since 1.6.0 --- End diff -- Need to fix these versions, here and elsewhere. This change would land in Spark 2.4.0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r192520566 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NullExpressionsSuite.scala --- @@ -24,7 +24,7 @@ import org.apache.spark.sql.catalyst.expressions.objects.AssertNotNull import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, Project} import org.apache.spark.sql.types._ -class NullExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { + class NullExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { --- End diff -- Revert this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
GitHub user NihalHarish opened a pull request: https://github.com/apache/spark/pull/21482 [SPARK-24393][SQL] SQL builtin: isinf ## What changes were proposed in this pull request? Implemented isinf to test if a float or double value is Infinity. ## How was this patch tested? Unit tests have been added to sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NullExpressionsSuite.scala You can merge this pull request into a Git repository by running: $ git pull https://github.com/NihalHarish/spark SPARK-24393-SQL-builtin-isinf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21482.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21482 commit bcdaab2f8c9c5afc877d3a54f658296aba78fdf0 Author: Nihal Harish Date: 2018-06-01T21:23:24Z [SPARK-24393][SQL] SQL builtin: isinf --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org