[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r182624162 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -505,3 +505,59 @@ case class ArrayMax(child: Expression) extends UnaryExpression with ImplicitCast override def prettyName: String = "array_max" } + + +/** + * Returns the position of the first occurrence of element in the given array as long. + * Returns 0 if the given value could not be found in the array. Returns null if either of + * the arguments are null + * + * NOTE: that this is not zero based, but 1-based index. The first element in the array has + * index 1. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(3, 2, 1), 1); + 3 + """, + since = "2.4.0") --- End diff -- Just wanted to note that we can use `note` here too: https://github.com/apache/spark/blob/2ce37b50fc01558f49ad22f89c8659f50544ffec/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala#L101-L103 I am mentioning this because we are adding many functions now :-). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21037 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r182298375 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -353,3 +353,61 @@ case class ArrayMax(child: Expression) extends UnaryExpression with ImplicitCast override def prettyName: String = "array_max" } + + +/** + * Returns the position of the first occurrence of element in the given array as long. + * Returns 0 if substr could not be found in str. Returns null if either of the arguments are null + * + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1. --- End diff -- nit: remove `that`? `The first character in str has index 1.` -> `The first element in the array has index 1.` or something? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r182297896 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -353,3 +353,61 @@ case class ArrayMax(child: Expression) extends UnaryExpression with ImplicitCast override def prettyName: String = "array_max" } + + +/** + * Returns the position of the first occurrence of element in the given array as long. + * Returns 0 if substr could not be found in str. Returns null if either of the arguments are null --- End diff -- `Returns 0 if substr could not be found in str` -> `Returns 0 if the value could not be found in the array` or something? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r182298195 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -353,3 +353,61 @@ case class ArrayMax(child: Expression) extends UnaryExpression with ImplicitCast override def prettyName: String = "array_max" } + + +/** + * Returns the position of the first occurrence of element in the given array as long. + * Returns 0 if substr could not be found in str. Returns null if either of the arguments are null + * + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(3, 2, 1), 1); + 3 + """, + since = "2.4.0") +case class ArrayPosition(left: Expression, right: Expression) + extends BinaryExpression with ImplicitCastInputTypes { + + override def dataType: DataType = LongType + override def inputTypes: Seq[AbstractDataType] = +Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType) + + override def nullable: Boolean = { +left.nullable || right.nullable + } --- End diff -- We don't need this here because this is the same as the one in `BinaryExpression`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r182316985 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala --- @@ -18,6 +18,7 @@ package org.apache.spark.sql.catalyst.expressions import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.dsl.expressions._ --- End diff -- nit: unnecessary import? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r182297625 --- Diff: python/pyspark/sql/functions.py --- @@ -1846,6 +1846,23 @@ def array_contains(col, value): return Column(sc._jvm.functions.array_contains(_to_java_column(col), value)) +@since(2.4) +def array_position(col, value): +""" +Collection function: Locates the position of the first occurrence of the given value +in the given array. Returns null if either of the arguments are null. + +.. note:: The position is not zero based, but 1 based index. Returns 0 if substr +could not be found in str. --- End diff -- `Returns 0 if substr could not be found in str` -> `Returns 0 if the value could not be found in the array` or something? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r181833370 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Returns the position of the first occurrence of element in the given array as long. + * Returns 0 if substr could not be found in str. Returns null if either of the arguments are null + * + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(3, 2, 1), 1); + 3 + """, + since = "2.4.0") +case class ArrayPosition(left: Expression, right: Expression) + extends BinaryExpression with ImplicitCastInputTypes { + + override def dataType: DataType = LongType + override def inputTypes: Seq[AbstractDataType] = +Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType) + + override def nullable: Boolean = { +left.nullable || right.nullable || left.dataType.asInstanceOf[ArrayType].containsNull + } + + override def nullSafeEval(arr: Any, value: Any): Any = { +arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) => + if (v == value) { +return (i + 1).toLong + } +) +0L + } + + override def prettyName: String = "array_position" + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +nullSafeCodeGen(ctx, ev, (arr, value) => { + val pos = ctx.freshName("arrayPosition") + val i = ctx.freshName("i") + val getValue = CodeGenerator.getValue(arr, right.dataType, i) + s""" + |int ${pos} = 0; + |for (int $i = 0; $i < $arr.numElements(); $i ++) { + | if (${ctx.genEqual(right.dataType, value, getValue)}) { --- End diff -- I totally agree with you. I added one test case for this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r181827177 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * A function that returns the position of the first occurrence of element in the given array + * as long. Returns 0 if substr could not be found in str. + * Returns null if either of the arguments are null and + * + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(3, 2, 1), 1); + 3 + """, + since = "2.4.0") +case class ArrayPosition(left: Expression, right: Expression) --- End diff -- According to [these UTs in Presto](https://github.com/prestodb/presto/blob/master/presto-main/src/test/java/com/facebook/presto/type/TestArrayOperators.java#L651-L656), you are right. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r181826436 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Returns the position of the first occurrence of element in the given array as long. + * Returns 0 if substr could not be found in str. Returns null if either of the arguments are null + * + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(3, 2, 1), 1); + 3 + """, + since = "2.4.0") +case class ArrayPosition(left: Expression, right: Expression) + extends BinaryExpression with ImplicitCastInputTypes { + + override def dataType: DataType = LongType + override def inputTypes: Seq[AbstractDataType] = +Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType) + + override def nullable: Boolean = { +left.nullable || right.nullable || left.dataType.asInstanceOf[ArrayType].containsNull --- End diff -- ah, you are right. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r181825716 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Returns the position of the first occurrence of element in the given array as long. + * Returns 0 if substr could not be found in str. Returns null if either of the arguments are null + * + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(3, 2, 1), 1); + 3 + """, + since = "2.4.0") +case class ArrayPosition(left: Expression, right: Expression) + extends BinaryExpression with ImplicitCastInputTypes { + + override def dataType: DataType = LongType + override def inputTypes: Seq[AbstractDataType] = +Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType) + + override def nullable: Boolean = { +left.nullable || right.nullable || left.dataType.asInstanceOf[ArrayType].containsNull + } + + override def nullSafeEval(arr: Any, value: Any): Any = { +arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) => + if (v == value) { +return (i + 1).toLong + } +) +0L + } + + override def prettyName: String = "array_position" + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +nullSafeCodeGen(ctx, ev, (arr, value) => { + val pos = ctx.freshName("arrayPosition") + val i = ctx.freshName("i") + val getValue = CodeGenerator.getValue(arr, right.dataType, i) + s""" + |int ${pos} = 0; + |for (int $i = 0; $i < $arr.numElements(); $i ++) { + | if (${ctx.genEqual(right.dataType, value, getValue)}) { + |${pos} = $i + 1; + |break; + | } + |} + |${ev.value} = (long) $pos; + """ --- End diff -- good catch, sorry again --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r181614224 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Returns the position of the first occurrence of element in the given array as long. + * Returns 0 if substr could not be found in str. Returns null if either of the arguments are null + * + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(3, 2, 1), 1); + 3 + """, + since = "2.4.0") +case class ArrayPosition(left: Expression, right: Expression) + extends BinaryExpression with ImplicitCastInputTypes { + + override def dataType: DataType = LongType + override def inputTypes: Seq[AbstractDataType] = +Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType) + + override def nullable: Boolean = { +left.nullable || right.nullable || left.dataType.asInstanceOf[ArrayType].containsNull --- End diff -- I guess `left.dataType.asInstanceOf[ArrayType].containsNull` is not related to the nullability of this function? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r181613270 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Returns the position of the first occurrence of element in the given array as long. + * Returns 0 if substr could not be found in str. Returns null if either of the arguments are null + * + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(3, 2, 1), 1); + 3 + """, + since = "2.4.0") +case class ArrayPosition(left: Expression, right: Expression) + extends BinaryExpression with ImplicitCastInputTypes { + + override def dataType: DataType = LongType + override def inputTypes: Seq[AbstractDataType] = +Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType) + + override def nullable: Boolean = { +left.nullable || right.nullable || left.dataType.asInstanceOf[ArrayType].containsNull + } + + override def nullSafeEval(arr: Any, value: Any): Any = { +arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) => + if (v == value) { +return (i + 1).toLong + } +) +0L + } + + override def prettyName: String = "array_position" + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +nullSafeCodeGen(ctx, ev, (arr, value) => { + val pos = ctx.freshName("arrayPosition") + val i = ctx.freshName("i") + val getValue = CodeGenerator.getValue(arr, right.dataType, i) + s""" + |int ${pos} = 0; --- End diff -- nit: `$pos` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r181616266 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala --- @@ -105,4 +106,26 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(ArrayContains(a3, Literal("")), null) checkEvaluation(ArrayContains(a3, Literal.create(null, StringType)), null) } + + --- End diff -- nit: remove an extra line. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r181613044 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Returns the position of the first occurrence of element in the given array as long. + * Returns 0 if substr could not be found in str. Returns null if either of the arguments are null + * + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(3, 2, 1), 1); + 3 + """, + since = "2.4.0") +case class ArrayPosition(left: Expression, right: Expression) + extends BinaryExpression with ImplicitCastInputTypes { + + override def dataType: DataType = LongType + override def inputTypes: Seq[AbstractDataType] = +Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType) + + override def nullable: Boolean = { +left.nullable || right.nullable || left.dataType.asInstanceOf[ArrayType].containsNull + } + + override def nullSafeEval(arr: Any, value: Any): Any = { +arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) => + if (v == value) { +return (i + 1).toLong + } +) +0L + } + + override def prettyName: String = "array_position" + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +nullSafeCodeGen(ctx, ev, (arr, value) => { + val pos = ctx.freshName("arrayPosition") + val i = ctx.freshName("i") + val getValue = CodeGenerator.getValue(arr, right.dataType, i) + s""" + |int ${pos} = 0; + |for (int $i = 0; $i < $arr.numElements(); $i ++) { + | if (${ctx.genEqual(right.dataType, value, getValue)}) { + |${pos} = $i + 1; + |break; + | } + |} + |${ev.value} = (long) $pos; + """ --- End diff -- `stripMargin` is missing? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r181615751 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Returns the position of the first occurrence of element in the given array as long. + * Returns 0 if substr could not be found in str. Returns null if either of the arguments are null + * + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(3, 2, 1), 1); + 3 + """, + since = "2.4.0") +case class ArrayPosition(left: Expression, right: Expression) + extends BinaryExpression with ImplicitCastInputTypes { + + override def dataType: DataType = LongType + override def inputTypes: Seq[AbstractDataType] = +Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType) + + override def nullable: Boolean = { +left.nullable || right.nullable || left.dataType.asInstanceOf[ArrayType].containsNull + } + + override def nullSafeEval(arr: Any, value: Any): Any = { +arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) => + if (v == value) { +return (i + 1).toLong + } +) +0L + } + + override def prettyName: String = "array_position" + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +nullSafeCodeGen(ctx, ev, (arr, value) => { + val pos = ctx.freshName("arrayPosition") + val i = ctx.freshName("i") + val getValue = CodeGenerator.getValue(arr, right.dataType, i) + s""" + |int ${pos} = 0; + |for (int $i = 0; $i < $arr.numElements(); $i ++) { + | if (${ctx.genEqual(right.dataType, value, getValue)}) { --- End diff -- I guess we need to check null for each array element? E.g. for the test `Array Position` in `CollectionExpressionsSuite`, if `a0` is as follows: ``` val a0 = Literal.create(Seq(1, null, 2, 3), ArrayType(IntegerType)) ``` `checkEvaluation(ArrayPosition(a0, Literal(0)), 0L)` will fail. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r181614388 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * A function that returns the position of the first occurrence of element in the given array + * as long. Returns 0 if substr could not be found in str. + * Returns null if either of the arguments are null and + * + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(3, 2, 1), 1); + 3 + """, + since = "2.4.0") +case class ArrayPosition(left: Expression, right: Expression) --- End diff -- So we can't know the position of `null` in the array even if the array contains `null`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r181283242 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * A function that returns the position of the first occurrence of element in the given array + * as long. Returns 0 if substr could not be found in str. + * Returns null if either of the arguments are null and --- End diff -- Good catch, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r181154371 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * A function that returns the position of the first occurrence of element in the given array + * as long. Returns 0 if substr could not be found in str. + * Returns null if either of the arguments are null and + * + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(3, 2, 1), 1); + 3 + """, + since = "2.4.0") +case class ArrayPosition(left: Expression, right: Expression) --- End diff -- Ah, if an array in `left` contains a `null` element, as you can see [a UT](https://github.com/apache/spark/pull/21037/files#diff-d31eca9f1c4c33104dc2cb8950486910R121), it works as an usual element. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r181151969 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * A function that returns the position of the first occurrence of element in the given array + * as long. Returns 0 if substr could not be found in str. + * Returns null if either of the arguments are null and + * + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(3, 2, 1), 1); + 3 + """, + since = "2.4.0") +case class ArrayPosition(left: Expression, right: Expression) --- End diff -- Good question. I cannot find the behavior in Presto from the document and test cases. Thus, I leave as-is. Should we return `null`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r181101544 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * A function that returns the position of the first occurrence of element in the given array + * as long. Returns 0 if substr could not be found in str. + * Returns null if either of the arguments are null and + * + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, element) - Returns the (1-based) index of the first element of the array as long. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(3, 2, 1), 1); + 3 + """, + since = "2.4.0") +case class ArrayPosition(left: Expression, right: Expression) --- End diff -- What is the behavior when `left` contains null element? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r181100763 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * A function that returns the position of the first occurrence of element in the given array + * as long. Returns 0 if substr could not be found in str. + * Returns null if either of the arguments are null and --- End diff -- Looks like this is a broken sentence? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r180743190 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -1002,6 +1002,57 @@ case class StringInstr(str: Expression, substr: Expression) } } +/** + * A function that returns the position of the first occurrence of substr in the given string + * as BigInt. Returns 0 if substr could not be found in str. + * Returns null if either of the arguments are null and + * + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1. + */ +@ExpressionDescription( + usage = """ +_FUNC_(str, substr) - Returns the (1-based) index of the first occurrence of `substr` in `str`. + """, + examples = """ +Examples: + > SELECT _FUNC_('SparkSQL', 'SQL'); + 6 + """) --- End diff -- Oh, good catch --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21037#discussion_r180739909 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -1002,6 +1002,57 @@ case class StringInstr(str: Expression, substr: Expression) } } +/** + * A function that returns the position of the first occurrence of substr in the given string + * as BigInt. Returns 0 if substr could not be found in str. + * Returns null if either of the arguments are null and + * + * NOTE: that this is not zero based, but 1-based index. The first character in str has index 1. + */ +@ExpressionDescription( + usage = """ +_FUNC_(str, substr) - Returns the (1-based) index of the first occurrence of `substr` in `str`. + """, + examples = """ +Examples: + > SELECT _FUNC_('SparkSQL', 'SQL'); + 6 + """) --- End diff -- `since` :)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/21037 [SPARK-23919][SQL] Add array_position function ## What changes were proposed in this pull request? The PR adds the SQL function `array_position`. The behavior of the function is based on Presto's one. The function returns the position of the first occurrence of substr in the given string using 1-based index as BigInt. ## How was this patch tested? Added UTs You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-23919 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21037.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21037 commit 80fc7b0cb5c61345d05bee971b13e44727b1e108 Author: Kazuaki IshizakiDate: 2018-04-11T06:21:07Z initial commit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org