[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21053 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r182486210 --- Diff: python/pyspark/sql/functions.py --- @@ -1846,6 +1846,28 @@ def array_contains(col, value): return Column(sc._jvm.functions.array_contains(_to_java_column(col), value)) +@ignore_unicode_prefix +@since(2.4) +def element_at(col, value): --- End diff -- Sure, done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r182362454 --- Diff: python/pyspark/sql/functions.py --- @@ -1846,6 +1846,28 @@ def array_contains(col, value): return Column(sc._jvm.functions.array_contains(_to_java_column(col), value)) +@ignore_unicode_prefix +@since(2.4) +def element_at(col, value): --- End diff -- How about `extraction` which is from `Column.apply(extraction)`? cc @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r182350430 --- Diff: python/pyspark/sql/functions.py --- @@ -1846,6 +1846,28 @@ def array_contains(col, value): return Column(sc._jvm.functions.array_contains(_to_java_column(col), value)) +@ignore_unicode_prefix +@since(2.4) +def element_at(col, value): --- End diff -- Umm, the 2nd argument acts as `index` for an array or `key` for a map. This is why I used `value`. Can I use `index`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r182325781 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -417,3 +417,106 @@ case class ArrayMax(child: Expression) extends UnaryExpression with ImplicitCast override def prettyName: String = "array_max" } + +/** + * Returns the value of index `right` in Array `left` or the value for key `right` in Map `left`. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, index) - Returns element of array at given index. If index < 0, accesses elements + from the last to the first. Returns NULL if the index exceeds the length of the array. --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r182327230 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -417,3 +417,106 @@ case class ArrayMax(child: Expression) extends UnaryExpression with ImplicitCast override def prettyName: String = "array_max" } + +/** + * Returns the value of index `right` in Array `left` or the value for key `right` in Map `left`. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, index) - Returns element of array at given index. If index < 0, accesses elements + from the last to the first. Returns NULL if the index exceeds the length of the array. + +_FUNC_(map, key) - Returns value for given key, or NULL if the key is not contained in the map + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), 2); + 2 + > SELECT _FUNC_(map(1, 'a', 2, 'b'), 2); + "b" + """, + since = "2.4.0") +case class ElementAt(left: Expression, right: Expression) extends GetMapValueUtil { + + override def dataType: DataType = left.dataType match { +case _: ArrayType => left.dataType.asInstanceOf[ArrayType].elementType +case _: MapType => left.dataType.asInstanceOf[MapType].valueType + } --- End diff -- nit: how about: ```scala override def dataType: DataType = left.dataType match { case ArrayType(elementType, _) => elementType case MapType(_, valueType, _) => valueType } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r182323290 --- Diff: python/pyspark/sql/functions.py --- @@ -1846,6 +1846,28 @@ def array_contains(col, value): return Column(sc._jvm.functions.array_contains(_to_java_column(col), value)) +@ignore_unicode_prefix +@since(2.4) +def element_at(col, value): +""" +Collection function: returns element of array at given index in value if col is array. +returns value for the given key in value if col is map. + +:param col: name of column containing array or map +:param value: value to check for in array or key to check for in map --- End diff -- Can you add a note or something to notice the index is 1-based. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r182326476 --- Diff: python/pyspark/sql/functions.py --- @@ -1846,6 +1846,28 @@ def array_contains(col, value): return Column(sc._jvm.functions.array_contains(_to_java_column(col), value)) +@ignore_unicode_prefix +@since(2.4) +def element_at(col, value): --- End diff -- The name `value` is confusing because this is for an index or a key? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181793795 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Returns the value of index `right` in Array `left` or the value for key `right` in Map `left`. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, index) - Returns element of array at given index. If index < 0, accesses elements + from the last to the first. + +_FUNC_(map, key) - Returns value for given key, or NULL if the key is not contained in the map + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), 2); + 2 + > SELECT _FUNC_(map(1, 'a', 2, 'b'), 2); + "b" + """, + since = "2.4.0") +case class ElementAt(left: Expression, right: Expression) extends GetMapValueUtil { + + override def dataType: DataType = left.dataType match { +case _: ArrayType => left.dataType.asInstanceOf[ArrayType].elementType +case _: MapType => left.dataType.asInstanceOf[MapType].valueType + } + + override def inputTypes: Seq[AbstractDataType] = { +Seq(TypeCollection(ArrayType, MapType), + left.dataType match { +case _: ArrayType => IntegerType +case _: MapType => left.dataType.asInstanceOf[MapType].keyType + } +) + } + + override def nullable: Boolean = true --- End diff -- year, may depend on `right` value, too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181790679 --- Diff: python/pyspark/sql/functions.py --- @@ -1846,6 +1846,27 @@ def array_contains(col, value): return Column(sc._jvm.functions.array_contains(_to_java_column(col), value)) +@since(2.4) --- End diff -- I see, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181687320 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Returns the value of index `right` in Array `left` or the value for key `right` in Map `left`. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, index) - Returns element of array at given index. If index < 0, accesses elements + from the last to the first. + +_FUNC_(map, key) - Returns value for given key, or NULL if the key is not contained in the map + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), 2); + 2 + > SELECT _FUNC_(map(1, 'a', 2, 'b'), 2); + "b" + """, + since = "2.4.0") +case class ElementAt(left: Expression, right: Expression) extends GetMapValueUtil { + + override def dataType: DataType = left.dataType match { +case _: ArrayType => left.dataType.asInstanceOf[ArrayType].elementType +case _: MapType => left.dataType.asInstanceOf[MapType].valueType + } + + override def inputTypes: Seq[AbstractDataType] = { +Seq(TypeCollection(ArrayType, MapType), + left.dataType match { +case _: ArrayType => IntegerType +case _: MapType => left.dataType.asInstanceOf[MapType].keyType + } +) + } + + override def nullable: Boolean = true + + override def nullSafeEval(value: Any, ordinal: Any): Any = { +left.dataType match { + case _: ArrayType => +val array = value.asInstanceOf[ArrayData] +val index = ordinal.asInstanceOf[Int] +if (array.numElements() < math.abs(index)) { + null +} else { + val idx = if (index == 0) { +throw new ArrayIndexOutOfBoundsException("SQL array indices start at 1") + } else if (index > 0) { +index - 1 + } else { +array.numElements() + index + } + if (array.isNullAt(idx)) { +null + } else { +array.get(idx, dataType) + } +} + case _: MapType => +getValueEval(value, ordinal, left.dataType.asInstanceOf[MapType].keyType) +} + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +left.dataType match { + case _: ArrayType => +nullSafeCodeGen(ctx, ev, (eval1, eval2) => { + val index = ctx.freshName("elementAtIndex") + val nullCheck = if (left.dataType.asInstanceOf[ArrayType].containsNull) { +s""" + |if ($eval1.isNullAt($index)) { + | ${ev.isNull} = true; + |} else + """ + } else { +"" + } + s""" + |int $index = (int) $eval2; + |if ($eval1.numElements() < Math.abs($index)) { + | ${ev.isNull} = true; + |} else { + | if ($index == 0) { + |throw new ArrayIndexOutOfBoundsException("SQL array indices start at 1"); + | } else if ($index > 0) { + |$index--; + | } else { + |$index += $eval1.numElements(); + | } + | $nullCheck + | { + |${ev.value} = ${CodeGenerator.getValue(eval1, dataType, index)}; + | } + |} + """ --- End diff -- good catch, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181624253 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Returns the value of index `right` in Array `left` or the value for key `right` in Map `left`. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, index) - Returns element of array at given index. If index < 0, accesses elements + from the last to the first. + +_FUNC_(map, key) - Returns value for given key, or NULL if the key is not contained in the map + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), 2); + 2 + > SELECT _FUNC_(map(1, 'a', 2, 'b'), 2); + "b" + """, + since = "2.4.0") +case class ElementAt(left: Expression, right: Expression) extends GetMapValueUtil { + + override def dataType: DataType = left.dataType match { +case _: ArrayType => left.dataType.asInstanceOf[ArrayType].elementType +case _: MapType => left.dataType.asInstanceOf[MapType].valueType + } + + override def inputTypes: Seq[AbstractDataType] = { +Seq(TypeCollection(ArrayType, MapType), + left.dataType match { +case _: ArrayType => IntegerType +case _: MapType => left.dataType.asInstanceOf[MapType].keyType + } +) + } + + override def nullable: Boolean = true --- End diff -- ah, I see. Invalid `right` can cause null result too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181623803 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Returns the value of index `right` in Array `left` or the value for key `right` in Map `left`. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, index) - Returns element of array at given index. If index < 0, accesses elements + from the last to the first. + +_FUNC_(map, key) - Returns value for given key, or NULL if the key is not contained in the map + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), 2); + 2 + > SELECT _FUNC_(map(1, 'a', 2, 'b'), 2); + "b" + """, + since = "2.4.0") +case class ElementAt(left: Expression, right: Expression) extends GetMapValueUtil { + + override def dataType: DataType = left.dataType match { +case _: ArrayType => left.dataType.asInstanceOf[ArrayType].elementType +case _: MapType => left.dataType.asInstanceOf[MapType].valueType + } + + override def inputTypes: Seq[AbstractDataType] = { +Seq(TypeCollection(ArrayType, MapType), + left.dataType match { +case _: ArrayType => IntegerType +case _: MapType => left.dataType.asInstanceOf[MapType].keyType + } +) + } + + override def nullable: Boolean = true --- End diff -- I'm afraid it's wrong because this returns `null` when the given index is "out of bounds" (`array.numElements() < math.abs(index)`) for array type or the given key doesn't exist for map type. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181623729 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Returns the value of index `right` in Array `left` or the value for key `right` in Map `left`. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, index) - Returns element of array at given index. If index < 0, accesses elements + from the last to the first. + +_FUNC_(map, key) - Returns value for given key, or NULL if the key is not contained in the map + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), 2); + 2 + > SELECT _FUNC_(map(1, 'a', 2, 'b'), 2); + "b" + """, + since = "2.4.0") +case class ElementAt(left: Expression, right: Expression) extends GetMapValueUtil { + + override def dataType: DataType = left.dataType match { +case _: ArrayType => left.dataType.asInstanceOf[ArrayType].elementType +case _: MapType => left.dataType.asInstanceOf[MapType].valueType + } + + override def inputTypes: Seq[AbstractDataType] = { +Seq(TypeCollection(ArrayType, MapType), + left.dataType match { +case _: ArrayType => IntegerType +case _: MapType => left.dataType.asInstanceOf[MapType].keyType + } +) + } + + override def nullable: Boolean = true + + override def nullSafeEval(value: Any, ordinal: Any): Any = { +left.dataType match { + case _: ArrayType => +val array = value.asInstanceOf[ArrayData] +val index = ordinal.asInstanceOf[Int] +if (array.numElements() < math.abs(index)) { + null +} else { + val idx = if (index == 0) { +throw new ArrayIndexOutOfBoundsException("SQL array indices start at 1") + } else if (index > 0) { +index - 1 + } else { +array.numElements() + index + } + if (array.isNullAt(idx)) { --- End diff -- nit: ```scala if(left.dataType.asInstanceOf[ArrayType].containsNull && array.isNullAt(idx)) { ... } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181623519 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Returns the value of index `right` in Array `left` or the value for key `right` in Map `left`. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, index) - Returns element of array at given index. If index < 0, accesses elements + from the last to the first. --- End diff -- nit: `Returns NULL if the index exceeds the length of the array`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181622879 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Returns the value of index `right` in Array `left` or the value for key `right` in Map `left`. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, index) - Returns element of array at given index. If index < 0, accesses elements + from the last to the first. + +_FUNC_(map, key) - Returns value for given key, or NULL if the key is not contained in the map + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), 2); + 2 + > SELECT _FUNC_(map(1, 'a', 2, 'b'), 2); + "b" + """, + since = "2.4.0") +case class ElementAt(left: Expression, right: Expression) extends GetMapValueUtil { + + override def dataType: DataType = left.dataType match { +case _: ArrayType => left.dataType.asInstanceOf[ArrayType].elementType +case _: MapType => left.dataType.asInstanceOf[MapType].valueType + } + + override def inputTypes: Seq[AbstractDataType] = { +Seq(TypeCollection(ArrayType, MapType), + left.dataType match { +case _: ArrayType => IntegerType +case _: MapType => left.dataType.asInstanceOf[MapType].keyType + } +) + } + + override def nullable: Boolean = true --- End diff -- Maybe? ```scala override def nullable: Boolean = left.dataType match { case a: ArrayType => a.containsNull case m: MapType => m.valueContainsNull } || left.nullable || right.nullable ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181619426 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Returns the value of index `right` in Array `left` or the value for key `right` in Map `left`. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, index) - Returns element of array at given index. If index < 0, accesses elements + from the last to the first. + +_FUNC_(map, key) - Returns value for given key, or NULL if the key is not contained in the map + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), 2); + 2 + > SELECT _FUNC_(map(1, 'a', 2, 'b'), 2); + "b" + """, + since = "2.4.0") +case class ElementAt(left: Expression, right: Expression) extends GetMapValueUtil { + + override def dataType: DataType = left.dataType match { +case _: ArrayType => left.dataType.asInstanceOf[ArrayType].elementType +case _: MapType => left.dataType.asInstanceOf[MapType].valueType + } + + override def inputTypes: Seq[AbstractDataType] = { +Seq(TypeCollection(ArrayType, MapType), + left.dataType match { +case _: ArrayType => IntegerType +case _: MapType => left.dataType.asInstanceOf[MapType].keyType + } +) + } + + override def nullable: Boolean = true + + override def nullSafeEval(value: Any, ordinal: Any): Any = { +left.dataType match { + case _: ArrayType => +val array = value.asInstanceOf[ArrayData] +val index = ordinal.asInstanceOf[Int] +if (array.numElements() < math.abs(index)) { + null +} else { + val idx = if (index == 0) { +throw new ArrayIndexOutOfBoundsException("SQL array indices start at 1") + } else if (index > 0) { +index - 1 + } else { +array.numElements() + index + } + if (array.isNullAt(idx)) { +null + } else { +array.get(idx, dataType) + } +} + case _: MapType => +getValueEval(value, ordinal, left.dataType.asInstanceOf[MapType].keyType) +} + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +left.dataType match { + case _: ArrayType => +nullSafeCodeGen(ctx, ev, (eval1, eval2) => { + val index = ctx.freshName("elementAtIndex") + val nullCheck = if (left.dataType.asInstanceOf[ArrayType].containsNull) { +s""" + |if ($eval1.isNullAt($index)) { + | ${ev.isNull} = true; + |} else + """ + } else { +"" + } + s""" + |int $index = (int) $eval2; + |if ($eval1.numElements() < Math.abs($index)) { + | ${ev.isNull} = true; + |} else { + | if ($index == 0) { + |throw new ArrayIndexOutOfBoundsException("SQL array indices start at 1"); + | } else if ($index > 0) { + |$index--; + | } else { + |$index += $eval1.numElements(); + | } + | $nullCheck + | { + |${ev.value} = ${CodeGenerator.getValue(eval1, dataType, index)}; + | } + |} + """ --- End diff -- `stripMargin` is missing? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181618055 --- Diff: python/pyspark/sql/functions.py --- @@ -1846,6 +1846,27 @@ def array_contains(col, value): return Column(sc._jvm.functions.array_contains(_to_java_column(col), value)) +@since(2.4) --- End diff -- We need to annotate as `@ignore_unicode_prefix`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181617303 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala --- @@ -354,3 +336,37 @@ case class GetMapValue(child: Expression, key: Expression) }) } } + +/** + * Returns the value of key `key` in Map `child`. + * + * We need to do type checking here as `key` expression maybe unresolved. + */ +case class GetMapValue(child: Expression, key: Expression) extends GetMapValueUtil + with ExtractValue with NullIntolerant { --- End diff -- nit: maybe `extends ...` should be this line. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181619450 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Returns the value of index `right` in Array `left` or the value for key `right` in Map `left`. + */ +@ExpressionDescription( + usage = """ +_FUNC_(array, index) - Returns element of array at given index. If index < 0, accesses elements + from the last to the first. + +_FUNC_(map, key) - Returns value for given key, or NULL if the key is not contained in the map + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), 2); + 2 + > SELECT _FUNC_(map(1, 'a', 2, 'b'), 2); + "b" + """, + since = "2.4.0") +case class ElementAt(left: Expression, right: Expression) extends GetMapValueUtil { + + override def dataType: DataType = left.dataType match { +case _: ArrayType => left.dataType.asInstanceOf[ArrayType].elementType +case _: MapType => left.dataType.asInstanceOf[MapType].valueType + } + + override def inputTypes: Seq[AbstractDataType] = { +Seq(TypeCollection(ArrayType, MapType), + left.dataType match { +case _: ArrayType => IntegerType +case _: MapType => left.dataType.asInstanceOf[MapType].keyType + } +) + } + + override def nullable: Boolean = true + + override def nullSafeEval(value: Any, ordinal: Any): Any = { +left.dataType match { + case _: ArrayType => +val array = value.asInstanceOf[ArrayData] +val index = ordinal.asInstanceOf[Int] +if (array.numElements() < math.abs(index)) { + null +} else { + val idx = if (index == 0) { +throw new ArrayIndexOutOfBoundsException("SQL array indices start at 1") + } else if (index > 0) { +index - 1 + } else { +array.numElements() + index + } + if (array.isNullAt(idx)) { +null + } else { +array.get(idx, dataType) + } +} + case _: MapType => +getValueEval(value, ordinal, left.dataType.asInstanceOf[MapType].keyType) +} + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +left.dataType match { + case _: ArrayType => +nullSafeCodeGen(ctx, ev, (eval1, eval2) => { + val index = ctx.freshName("elementAtIndex") + val nullCheck = if (left.dataType.asInstanceOf[ArrayType].containsNull) { +s""" + |if ($eval1.isNullAt($index)) { + | ${ev.isNull} = true; + |} else + """ --- End diff -- `stripMargin` is missing? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181610494 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -413,6 +413,78 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { ) } + test("element at function") { --- End diff -- good catch, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181529978 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -413,6 +413,78 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { ) } + test("element at function") { --- End diff -- also the function is element_at, not "element at" ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181529901 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -413,6 +413,78 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { ) } + test("element at function") { --- End diff -- why do we need so many test cases here? this is just to verify the api works end to end. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181475737 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Returns the value of index `right` in Array `left` or key right` in Map `left`. --- End diff -- good catch, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21053#discussion_r181376088 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +287,106 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Returns the value of index `right` in Array `left` or key right` in Map `left`. --- End diff -- We can improve this doc. There is also broken quote. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21053: [SPARK-23924][SQL] Add element_at function
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/21053 [SPARK-23924][SQL] Add element_at function ## What changes were proposed in this pull request? The PR adds the SQL function array_position. The behavior of the function is based on Presto's one. This function returns element of array at given index in value if column is array, or returns value for the given key in value if column is map. ## How was this patch tested? Added UTs You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-23924 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21053.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21053 commit d7ec6ccc177d07a8090ac27ce1659f427d1cf50a Author: Kazuaki Ishizaki Date: 2018-04-12T13:21:22Z initial commit --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org