[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r182624162
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -505,3 +505,59 @@ case class ArrayMax(child: Expression) extends 
UnaryExpression with ImplicitCast
 
   override def prettyName: String = "array_max"
 }
+
+
+/**
+ * Returns the position of the first occurrence of element in the given 
array as long.
+ * Returns 0 if the given value could not be found in the array. Returns 
null if either of
+ * the arguments are null
+ *
+ * NOTE: that this is not zero based, but 1-based index. The first element 
in the array has
+ *   index 1.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, element) - Returns the (1-based) index of the first 
element of the array as long.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(3, 2, 1), 1);
+   3
+  """,
+  since = "2.4.0")
--- End diff --

Just wanted to note that we can use `note` here too:


https://github.com/apache/spark/blob/2ce37b50fc01558f49ad22f89c8659f50544ffec/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala#L101-L103

I am mentioning this because we are adding many functions now :-).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21037


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-18 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r182298375
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -353,3 +353,61 @@ case class ArrayMax(child: Expression) extends 
UnaryExpression with ImplicitCast
 
   override def prettyName: String = "array_max"
 }
+
+
+/**
+ * Returns the position of the first occurrence of element in the given 
array as long.
+ * Returns 0 if substr could not be found in str. Returns null if either 
of the arguments are null
+ *
+ * NOTE: that this is not zero based, but 1-based index. The first 
character in str has index 1.
--- End diff --

nit: remove `that`?
`The first character in str has index 1.` -> `The first element in the 
array has index 1.` or something?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-18 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r182297896
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -353,3 +353,61 @@ case class ArrayMax(child: Expression) extends 
UnaryExpression with ImplicitCast
 
   override def prettyName: String = "array_max"
 }
+
+
+/**
+ * Returns the position of the first occurrence of element in the given 
array as long.
+ * Returns 0 if substr could not be found in str. Returns null if either 
of the arguments are null
--- End diff --

`Returns 0 if substr could not be found in str` -> `Returns 0 if the value 
could not be found in the array` or something?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-18 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r182298195
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -353,3 +353,61 @@ case class ArrayMax(child: Expression) extends 
UnaryExpression with ImplicitCast
 
   override def prettyName: String = "array_max"
 }
+
+
+/**
+ * Returns the position of the first occurrence of element in the given 
array as long.
+ * Returns 0 if substr could not be found in str. Returns null if either 
of the arguments are null
+ *
+ * NOTE: that this is not zero based, but 1-based index. The first 
character in str has index 1.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, element) - Returns the (1-based) index of the first 
element of the array as long.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(3, 2, 1), 1);
+   3
+  """,
+  since = "2.4.0")
+case class ArrayPosition(left: Expression, right: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes {
+
+  override def dataType: DataType = LongType
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType)
+
+  override def nullable: Boolean = {
+left.nullable || right.nullable
+  }
--- End diff --

We don't need this here because this is the same as the one in 
`BinaryExpression`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-18 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r182316985
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
 ---
@@ -18,6 +18,7 @@
 package org.apache.spark.sql.catalyst.expressions
 
 import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.dsl.expressions._
--- End diff --

nit: unnecessary import?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-18 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r182297625
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1846,6 +1846,23 @@ def array_contains(col, value):
 return Column(sc._jvm.functions.array_contains(_to_java_column(col), 
value))
 
 
+@since(2.4)
+def array_position(col, value):
+"""
+Collection function: Locates the position of the first occurrence of 
the given value
+in the given array. Returns null if either of the arguments are null.
+
+.. note:: The position is not zero based, but 1 based index. Returns 0 
if substr
+could not be found in str.
--- End diff --

`Returns 0 if substr could not be found in str` -> `Returns 0 if the value 
could not be found in the array` or something?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-16 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r181833370
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Returns the position of the first occurrence of element in the given 
array as long.
+ * Returns 0 if substr could not be found in str. Returns null if either 
of the arguments are null
+ *
+ * NOTE: that this is not zero based, but 1-based index. The first 
character in str has index 1.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, element) - Returns the (1-based) index of the first 
element of the array as long.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(3, 2, 1), 1);
+   3
+  """,
+  since = "2.4.0")
+case class ArrayPosition(left: Expression, right: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes {
+
+  override def dataType: DataType = LongType
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType)
+
+  override def nullable: Boolean = {
+left.nullable || right.nullable || 
left.dataType.asInstanceOf[ArrayType].containsNull
+  }
+
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) =>
+  if (v == value) {
+return (i + 1).toLong
+  }
+)
+0L
+  }
+
+  override def prettyName: String = "array_position"
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, (arr, value) => {
+  val pos = ctx.freshName("arrayPosition")
+  val i = ctx.freshName("i")
+  val getValue = CodeGenerator.getValue(arr, right.dataType, i)
+  s"""
+ |int ${pos} = 0;
+ |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+ |  if (${ctx.genEqual(right.dataType, value, getValue)}) {
--- End diff --

I totally agree with you. I added one test case for this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-16 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r181827177
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * A function that returns the position of the first occurrence of element 
in the given array
+ * as long. Returns 0 if substr could not be found in str.
+ * Returns null if either of the arguments are null and
+ *
+ * NOTE: that this is not zero based, but 1-based index. The first 
character in str has index 1.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, element) - Returns the (1-based) index of the first 
element of the array as long.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(3, 2, 1), 1);
+   3
+  """,
+  since = "2.4.0")
+case class ArrayPosition(left: Expression, right: Expression)
--- End diff --

According to [these UTs in 
Presto](https://github.com/prestodb/presto/blob/master/presto-main/src/test/java/com/facebook/presto/type/TestArrayOperators.java#L651-L656),
 you are right.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-16 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r181826436
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Returns the position of the first occurrence of element in the given 
array as long.
+ * Returns 0 if substr could not be found in str. Returns null if either 
of the arguments are null
+ *
+ * NOTE: that this is not zero based, but 1-based index. The first 
character in str has index 1.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, element) - Returns the (1-based) index of the first 
element of the array as long.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(3, 2, 1), 1);
+   3
+  """,
+  since = "2.4.0")
+case class ArrayPosition(left: Expression, right: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes {
+
+  override def dataType: DataType = LongType
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType)
+
+  override def nullable: Boolean = {
+left.nullable || right.nullable || 
left.dataType.asInstanceOf[ArrayType].containsNull
--- End diff --

ah, you are right.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-16 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r181825716
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Returns the position of the first occurrence of element in the given 
array as long.
+ * Returns 0 if substr could not be found in str. Returns null if either 
of the arguments are null
+ *
+ * NOTE: that this is not zero based, but 1-based index. The first 
character in str has index 1.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, element) - Returns the (1-based) index of the first 
element of the array as long.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(3, 2, 1), 1);
+   3
+  """,
+  since = "2.4.0")
+case class ArrayPosition(left: Expression, right: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes {
+
+  override def dataType: DataType = LongType
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType)
+
+  override def nullable: Boolean = {
+left.nullable || right.nullable || 
left.dataType.asInstanceOf[ArrayType].containsNull
+  }
+
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) =>
+  if (v == value) {
+return (i + 1).toLong
+  }
+)
+0L
+  }
+
+  override def prettyName: String = "array_position"
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, (arr, value) => {
+  val pos = ctx.freshName("arrayPosition")
+  val i = ctx.freshName("i")
+  val getValue = CodeGenerator.getValue(arr, right.dataType, i)
+  s"""
+ |int ${pos} = 0;
+ |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+ |  if (${ctx.genEqual(right.dataType, value, getValue)}) {
+ |${pos} = $i + 1;
+ |break;
+ |  }
+ |}
+ |${ev.value} = (long) $pos;
+   """
--- End diff --

good catch, sorry again


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-15 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r181614224
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Returns the position of the first occurrence of element in the given 
array as long.
+ * Returns 0 if substr could not be found in str. Returns null if either 
of the arguments are null
+ *
+ * NOTE: that this is not zero based, but 1-based index. The first 
character in str has index 1.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, element) - Returns the (1-based) index of the first 
element of the array as long.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(3, 2, 1), 1);
+   3
+  """,
+  since = "2.4.0")
+case class ArrayPosition(left: Expression, right: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes {
+
+  override def dataType: DataType = LongType
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType)
+
+  override def nullable: Boolean = {
+left.nullable || right.nullable || 
left.dataType.asInstanceOf[ArrayType].containsNull
--- End diff --

I guess `left.dataType.asInstanceOf[ArrayType].containsNull` is not related 
to the nullability of this function?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-15 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r181613270
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Returns the position of the first occurrence of element in the given 
array as long.
+ * Returns 0 if substr could not be found in str. Returns null if either 
of the arguments are null
+ *
+ * NOTE: that this is not zero based, but 1-based index. The first 
character in str has index 1.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, element) - Returns the (1-based) index of the first 
element of the array as long.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(3, 2, 1), 1);
+   3
+  """,
+  since = "2.4.0")
+case class ArrayPosition(left: Expression, right: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes {
+
+  override def dataType: DataType = LongType
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType)
+
+  override def nullable: Boolean = {
+left.nullable || right.nullable || 
left.dataType.asInstanceOf[ArrayType].containsNull
+  }
+
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) =>
+  if (v == value) {
+return (i + 1).toLong
+  }
+)
+0L
+  }
+
+  override def prettyName: String = "array_position"
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, (arr, value) => {
+  val pos = ctx.freshName("arrayPosition")
+  val i = ctx.freshName("i")
+  val getValue = CodeGenerator.getValue(arr, right.dataType, i)
+  s"""
+ |int ${pos} = 0;
--- End diff --

nit: `$pos`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-15 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r181616266
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
 ---
@@ -105,4 +106,26 @@ class CollectionExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(ArrayContains(a3, Literal("")), null)
 checkEvaluation(ArrayContains(a3, Literal.create(null, StringType)), 
null)
   }
+
+
--- End diff --

nit: remove an extra line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-15 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r181613044
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Returns the position of the first occurrence of element in the given 
array as long.
+ * Returns 0 if substr could not be found in str. Returns null if either 
of the arguments are null
+ *
+ * NOTE: that this is not zero based, but 1-based index. The first 
character in str has index 1.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, element) - Returns the (1-based) index of the first 
element of the array as long.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(3, 2, 1), 1);
+   3
+  """,
+  since = "2.4.0")
+case class ArrayPosition(left: Expression, right: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes {
+
+  override def dataType: DataType = LongType
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType)
+
+  override def nullable: Boolean = {
+left.nullable || right.nullable || 
left.dataType.asInstanceOf[ArrayType].containsNull
+  }
+
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) =>
+  if (v == value) {
+return (i + 1).toLong
+  }
+)
+0L
+  }
+
+  override def prettyName: String = "array_position"
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, (arr, value) => {
+  val pos = ctx.freshName("arrayPosition")
+  val i = ctx.freshName("i")
+  val getValue = CodeGenerator.getValue(arr, right.dataType, i)
+  s"""
+ |int ${pos} = 0;
+ |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+ |  if (${ctx.genEqual(right.dataType, value, getValue)}) {
+ |${pos} = $i + 1;
+ |break;
+ |  }
+ |}
+ |${ev.value} = (long) $pos;
+   """
--- End diff --

`stripMargin` is missing?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-15 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r181615751
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,60 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * Returns the position of the first occurrence of element in the given 
array as long.
+ * Returns 0 if substr could not be found in str. Returns null if either 
of the arguments are null
+ *
+ * NOTE: that this is not zero based, but 1-based index. The first 
character in str has index 1.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, element) - Returns the (1-based) index of the first 
element of the array as long.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(3, 2, 1), 1);
+   3
+  """,
+  since = "2.4.0")
+case class ArrayPosition(left: Expression, right: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes {
+
+  override def dataType: DataType = LongType
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(ArrayType, left.dataType.asInstanceOf[ArrayType].elementType)
+
+  override def nullable: Boolean = {
+left.nullable || right.nullable || 
left.dataType.asInstanceOf[ArrayType].containsNull
+  }
+
+  override def nullSafeEval(arr: Any, value: Any): Any = {
+arr.asInstanceOf[ArrayData].foreach(right.dataType, (i, v) =>
+  if (v == value) {
+return (i + 1).toLong
+  }
+)
+0L
+  }
+
+  override def prettyName: String = "array_position"
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+nullSafeCodeGen(ctx, ev, (arr, value) => {
+  val pos = ctx.freshName("arrayPosition")
+  val i = ctx.freshName("i")
+  val getValue = CodeGenerator.getValue(arr, right.dataType, i)
+  s"""
+ |int ${pos} = 0;
+ |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+ |  if (${ctx.genEqual(right.dataType, value, getValue)}) {
--- End diff --

I guess we need to check null for each array element?

E.g. for the test `Array Position` in `CollectionExpressionsSuite`, if `a0` 
is as follows:

```
val a0 = Literal.create(Seq(1, null, 2, 3), ArrayType(IntegerType))
```

`checkEvaluation(ArrayPosition(a0, Literal(0)), 0L)` will fail.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-15 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r181614388
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * A function that returns the position of the first occurrence of element 
in the given array
+ * as long. Returns 0 if substr could not be found in str.
+ * Returns null if either of the arguments are null and
+ *
+ * NOTE: that this is not zero based, but 1-based index. The first 
character in str has index 1.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, element) - Returns the (1-based) index of the first 
element of the array as long.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(3, 2, 1), 1);
+   3
+  """,
+  since = "2.4.0")
+case class ArrayPosition(left: Expression, right: Expression)
--- End diff --

So we can't know the position of `null` in the array even if the array 
contains `null`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-12 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r181283242
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * A function that returns the position of the first occurrence of element 
in the given array
+ * as long. Returns 0 if substr could not be found in str.
+ * Returns null if either of the arguments are null and
--- End diff --

Good catch, thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-12 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r181154371
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * A function that returns the position of the first occurrence of element 
in the given array
+ * as long. Returns 0 if substr could not be found in str.
+ * Returns null if either of the arguments are null and
+ *
+ * NOTE: that this is not zero based, but 1-based index. The first 
character in str has index 1.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, element) - Returns the (1-based) index of the first 
element of the array as long.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(3, 2, 1), 1);
+   3
+  """,
+  since = "2.4.0")
+case class ArrayPosition(left: Expression, right: Expression)
--- End diff --

Ah, if an array in `left` contains a `null` element, as you can see [a 
UT](https://github.com/apache/spark/pull/21037/files#diff-d31eca9f1c4c33104dc2cb8950486910R121),
 it works as an usual element.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-12 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r181151969
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * A function that returns the position of the first occurrence of element 
in the given array
+ * as long. Returns 0 if substr could not be found in str.
+ * Returns null if either of the arguments are null and
+ *
+ * NOTE: that this is not zero based, but 1-based index. The first 
character in str has index 1.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, element) - Returns the (1-based) index of the first 
element of the array as long.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(3, 2, 1), 1);
+   3
+  """,
+  since = "2.4.0")
+case class ArrayPosition(left: Expression, right: Expression)
--- End diff --

Good question. I cannot find the behavior in Presto from the document and 
test cases. Thus, I leave as-is.
Should we return `null`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r181101544
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * A function that returns the position of the first occurrence of element 
in the given array
+ * as long. Returns 0 if substr could not be found in str.
+ * Returns null if either of the arguments are null and
+ *
+ * NOTE: that this is not zero based, but 1-based index. The first 
character in str has index 1.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(array, element) - Returns the (1-based) index of the first 
element of the array as long.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(3, 2, 1), 1);
+   3
+  """,
+  since = "2.4.0")
+case class ArrayPosition(left: Expression, right: Expression)
--- End diff --

What is the behavior when `left` contains null element?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r181100763
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,61 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+/**
+ * A function that returns the position of the first occurrence of element 
in the given array
+ * as long. Returns 0 if substr could not be found in str.
+ * Returns null if either of the arguments are null and
--- End diff --

Looks like this is a broken sentence?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-11 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r180743190
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -1002,6 +1002,57 @@ case class StringInstr(str: Expression, substr: 
Expression)
   }
 }
 
+/**
+ * A function that returns the position of the first occurrence of substr 
in the given string
+ * as BigInt. Returns 0 if substr could not be found in str.
+ * Returns null if either of the arguments are null and
+ *
+ * NOTE: that this is not zero based, but 1-based index. The first 
character in str has index 1.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(str, substr) - Returns the (1-based) index of the first 
occurrence of `substr` in `str`.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_('SparkSQL', 'SQL');
+   6
+  """)
--- End diff --

Oh, good catch


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21037#discussion_r180739909
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -1002,6 +1002,57 @@ case class StringInstr(str: Expression, substr: 
Expression)
   }
 }
 
+/**
+ * A function that returns the position of the first occurrence of substr 
in the given string
+ * as BigInt. Returns 0 if substr could not be found in str.
+ * Returns null if either of the arguments are null and
+ *
+ * NOTE: that this is not zero based, but 1-based index. The first 
character in str has index 1.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(str, substr) - Returns the (1-based) index of the first 
occurrence of `substr` in `str`.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_('SparkSQL', 'SQL');
+   6
+  """)
--- End diff --

`since` :)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21037: [SPARK-23919][SQL] Add array_position function

2018-04-11 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/21037

[SPARK-23919][SQL] Add array_position function

## What changes were proposed in this pull request?

The PR adds the SQL function `array_position`. The behavior of the function 
is based on Presto's one.

The function returns the position of the first occurrence of substr in the 
given string using 1-based index as BigInt.

## How was this patch tested?

Added UTs

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-23919

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21037.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21037


commit 80fc7b0cb5c61345d05bee971b13e44727b1e108
Author: Kazuaki Ishizaki 
Date:   2018-04-11T06:21:07Z

initial commit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org