subject:"\[GitHub\] spark pull request #21208\: \[SPARK\-23925\]\[SQL\] Add array

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-16 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21208


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-14 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r187872304
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1468,3 +1468,149 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  private val MAX_ARRAY_LENGTH = ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  if (count.asInstanceOf[Int] > MAX_ARRAY_LENGTH) {
+throw new RuntimeException(s"Unsuccessful try to create array with 
$count elements" +
+  s"due to exceeding the array size limit $MAX_ARRAY_LENGTH.");
+  }
+  val element = left.eval(input)
+  new GenericArrayData(Array.fill(count.asInstanceOf[Int])(element))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val element = leftGen.value
+val count = rightGen.value
+val et = dataType.elementType
+
+val coreLogic = if (CodeGenerator.isPrimitiveType(et)) {
+  genCodeForPrimitiveElement(ctx, et, element, count, leftGen.isNull, 
ev.value)
+} else {
+  genCodeForNonPrimitiveElement(ctx, element, count, leftGen.isNull, 
ev.value)
+}
+val resultCode = nullElementsProtection(ev, rightGen.isNull, coreLogic)
+
+ev.copy(code =
+  s"""
+  |boolean ${ev.isNull} = false;
+  |${leftGen.code}
+  |${rightGen.code}
+  |${CodeGenerator.javaType(dataType)} ${ev.value} = 
${CodeGenerator.defaultValue(dataType)};
+  |$resultCode
+  """.stripMargin)
+  }
+
+  private def nullElementsProtection(ev: ExprCode,
+ rightIsNull: String,
--- End diff --

nit: indents


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-14 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r187858935
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -843,6 +843,82 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  test("array_repeat function") {
+val dummyFilter = (c: Column) => c.isNull || c.isNotNull // to switch 
codeGen on
+val strDF = Seq(
+("hi", 2),
+(null, 2)
--- End diff --

nit: indent


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-14 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r187858398
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1468,3 +1468,149 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  private val MAX_ARRAY_LENGTH = ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  if (count.asInstanceOf[Int] > MAX_ARRAY_LENGTH) {
+throw new RuntimeException(s"Unsuccessful try to create array with 
$count elements" +
+  s"due to exceeding the array size limit $MAX_ARRAY_LENGTH.");
+  }
+  val element = left.eval(input)
+  new GenericArrayData(Array.fill(count.asInstanceOf[Int])(element))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val element = leftGen.value
+val count = rightGen.value
+val et = dataType.elementType
+
+val coreLogic = if (CodeGenerator.isPrimitiveType(et)) {
+  genCodeForPrimitiveElement(ctx, et, element, count, leftGen.isNull, 
ev.value)
+} else {
+  genCodeForNonPrimitiveElement(ctx, element, count, leftGen.isNull, 
ev.value)
+}
+val resultCode = nullElementsProtection(ev, rightGen.isNull, coreLogic)
+
+ev.copy(code =
+  s"""
+  |boolean ${ev.isNull} = false;
+  |${leftGen.code}
+  |${rightGen.code}
+  |${CodeGenerator.javaType(dataType)} ${ev.value} = 
${CodeGenerator.defaultValue(dataType)};
+  |$resultCode
+  """.stripMargin)
+  }
+
+  private def nullElementsProtection(ev: ExprCode,
+ rightIsNull: String,
+ coreLogic: String): String = {
+if (nullable) {
+  s"""
+  |if ($rightIsNull) {
+  |  ${ev.isNull} = true;
+  |} else {
+  |  ${coreLogic}
+  |}
+  """.stripMargin
+} else {
+  coreLogic
+}
+  }
+
+  private def genCodeForNumberOfElements(ctx: CodegenContext, count: 
String): (String, String) = {
+val numElements = ctx.freshName("numElements")
+val numElementsCode =
+  s"""
+  |int $numElements = 0;
+  |if ($count > 0) {
+  |  $numElements = $count;
+  |}
+  |if ($numElements > $MAX_ARRAY_LENGTH) {
+  |  throw new RuntimeException("Unsuccessful try to create array with 
" + $numElements +
+  |" elements due to exceeding the array size limit 
$MAX_ARRAY_LENGTH.");
+  |}
+  """.stripMargin
+
+(numElements, numElementsCode)
+  }
+
+  private def genCodeForPrimitiveElement(ctx: CodegenContext,
+ elementType: DataType,
+ element: String,
+ count: String,
+ leftIsNull: String,
+ arrayDataName: String): String = {
+
+val tempArrayDataName = ctx.freshName("tempArrayData")
+val primitiveValueTypeName = 
CodeGenerator.primitiveTypeName(elementType)
+val (numElemName, numElemCode) = genCodeForNumberOfElements(ctx, count)
+
+s"""
+|$numElemCode
+|${ctx.createUnsafeArray(tempArrayDataName, numElemName, elementType, 
s" $prettyName failed.")}
+|if (!$leftIsNull) {
+|  for (int k = 0; k < $tempArrayDataName.numElements(); k++) {
+|$tempArrayDataName.set$primitiveValueTypeName(k, $element);
+|  }
+|} else {
+|  for (int k = 0; k < $tempArrayDataName.numElements(); k++) {
+|

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-14 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r187858372
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1468,3 +1468,149 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  private val MAX_ARRAY_LENGTH = ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  if (count.asInstanceOf[Int] > MAX_ARRAY_LENGTH) {
+throw new RuntimeException(s"Unsuccessful try to create array with 
$count elements" +
+  s"due to exceeding the array size limit $MAX_ARRAY_LENGTH.");
+  }
+  val element = left.eval(input)
+  new GenericArrayData(Array.fill(count.asInstanceOf[Int])(element))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val element = leftGen.value
+val count = rightGen.value
+val et = dataType.elementType
+
+val coreLogic = if (CodeGenerator.isPrimitiveType(et)) {
+  genCodeForPrimitiveElement(ctx, et, element, count, leftGen.isNull, 
ev.value)
+} else {
+  genCodeForNonPrimitiveElement(ctx, element, count, leftGen.isNull, 
ev.value)
+}
+val resultCode = nullElementsProtection(ev, rightGen.isNull, coreLogic)
+
+ev.copy(code =
+  s"""
+  |boolean ${ev.isNull} = false;
+  |${leftGen.code}
+  |${rightGen.code}
+  |${CodeGenerator.javaType(dataType)} ${ev.value} = 
${CodeGenerator.defaultValue(dataType)};
+  |$resultCode
+  """.stripMargin)
+  }
+
+  private def nullElementsProtection(ev: ExprCode,
+ rightIsNull: String,
+ coreLogic: String): String = {
+if (nullable) {
+  s"""
+  |if ($rightIsNull) {
+  |  ${ev.isNull} = true;
+  |} else {
+  |  ${coreLogic}
+  |}
+  """.stripMargin
+} else {
+  coreLogic
+}
+  }
+
+  private def genCodeForNumberOfElements(ctx: CodegenContext, count: 
String): (String, String) = {
+val numElements = ctx.freshName("numElements")
+val numElementsCode =
+  s"""
+  |int $numElements = 0;
+  |if ($count > 0) {
+  |  $numElements = $count;
+  |}
+  |if ($numElements > $MAX_ARRAY_LENGTH) {
+  |  throw new RuntimeException("Unsuccessful try to create array with 
" + $numElements +
+  |" elements due to exceeding the array size limit 
$MAX_ARRAY_LENGTH.");
+  |}
+  """.stripMargin
+
+(numElements, numElementsCode)
+  }
+
+  private def genCodeForPrimitiveElement(ctx: CodegenContext,
+ elementType: DataType,
+ element: String,
+ count: String,
+ leftIsNull: String,
+ arrayDataName: String): String = {
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-14 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r187856067
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1468,3 +1468,149 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
--- End diff --

`since = "2.4.0"`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-14 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r187856418
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1468,3 +1468,149 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  private val MAX_ARRAY_LENGTH = ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  if (count.asInstanceOf[Int] > MAX_ARRAY_LENGTH) {
+throw new RuntimeException(s"Unsuccessful try to create array with 
$count elements" +
--- End diff --

nit: need a space after `elements`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-14 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r187856635
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1468,3 +1468,149 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  private val MAX_ARRAY_LENGTH = ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  if (count.asInstanceOf[Int] > MAX_ARRAY_LENGTH) {
+throw new RuntimeException(s"Unsuccessful try to create array with 
$count elements" +
+  s"due to exceeding the array size limit $MAX_ARRAY_LENGTH.");
+  }
+  val element = left.eval(input)
+  new GenericArrayData(Array.fill(count.asInstanceOf[Int])(element))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val element = leftGen.value
+val count = rightGen.value
+val et = dataType.elementType
+
+val coreLogic = if (CodeGenerator.isPrimitiveType(et)) {
+  genCodeForPrimitiveElement(ctx, et, element, count, leftGen.isNull, 
ev.value)
+} else {
+  genCodeForNonPrimitiveElement(ctx, element, count, leftGen.isNull, 
ev.value)
+}
+val resultCode = nullElementsProtection(ev, rightGen.isNull, coreLogic)
+
+ev.copy(code =
+  s"""
+  |boolean ${ev.isNull} = false;
+  |${leftGen.code}
+  |${rightGen.code}
+  |${CodeGenerator.javaType(dataType)} ${ev.value} = 
${CodeGenerator.defaultValue(dataType)};
+  |$resultCode
--- End diff --

nit: usually we the following format for codegen:

```scala
  s"""
 |
 |
   """.stripMargin
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-14 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r187857608
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1468,3 +1468,149 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  private val MAX_ARRAY_LENGTH = ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  if (count.asInstanceOf[Int] > MAX_ARRAY_LENGTH) {
+throw new RuntimeException(s"Unsuccessful try to create array with 
$count elements" +
+  s"due to exceeding the array size limit $MAX_ARRAY_LENGTH.");
+  }
+  val element = left.eval(input)
+  new GenericArrayData(Array.fill(count.asInstanceOf[Int])(element))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val element = leftGen.value
+val count = rightGen.value
+val et = dataType.elementType
+
+val coreLogic = if (CodeGenerator.isPrimitiveType(et)) {
+  genCodeForPrimitiveElement(ctx, et, element, count, leftGen.isNull, 
ev.value)
+} else {
+  genCodeForNonPrimitiveElement(ctx, element, count, leftGen.isNull, 
ev.value)
+}
+val resultCode = nullElementsProtection(ev, rightGen.isNull, coreLogic)
+
+ev.copy(code =
+  s"""
+  |boolean ${ev.isNull} = false;
+  |${leftGen.code}
+  |${rightGen.code}
+  |${CodeGenerator.javaType(dataType)} ${ev.value} = 
${CodeGenerator.defaultValue(dataType)};
+  |$resultCode
+  """.stripMargin)
+  }
+
+  private def nullElementsProtection(ev: ExprCode,
+ rightIsNull: String,
+ coreLogic: String): String = {
--- End diff --

nit: style.

```scala
  private def nullElementsProtection(
  ev: ExprCode,
  rightIsNull: String,
  coreLogic: String): String = {
...
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-13 Thread pepinoflo

Github user pepinoflo commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r187816477
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -798,6 +798,156 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  test("array_repeat function") {
--- End diff --

Removed a few test cases in 471597a, keeping only different way of calling 
repeat with a value `count` of 2 on different types. Let me know if you think I 
should remove even more.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-13 Thread pepinoflo

Github user pepinoflo commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r187816430
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,140 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  new 
GenericArrayData(List.fill(count.asInstanceOf[Int])(left.eval(input)))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def nullSafeCodeGen(ctx: CodegenContext,
+   ev: ExprCode,
+   f: (String, String) => String): ExprCode = {
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val resultCode = f(leftGen.value, rightGen.value)
+
+if (nullable) {
+  val nullSafeEval =
+leftGen.code +
+  rightGen.code + ctx.nullSafeExec(right.nullable, 
rightGen.isNull) {
+s"""
+  ${ev.isNull} = false;
+  $resultCode
+"""
+  }
+
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = true;
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $nullSafeEval
+ """.stripMargin
+  )
+} else {
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = false;
+   | ${leftGen.code}
+   | ${rightGen.code}
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $resultCode
+ """.stripMargin
+, isNull = FalseLiteral)
+}
+
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+nullSafeCodeGen(ctx, ev, (l, r) => {
+  val et = dataType.elementType
+  val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+  val arrayDataName = ctx.freshName("arrayData")
+  val arrayName = ctx.freshName("arrayObject")
+  val numElements = ctx.freshName("numElements")
+
+  val genNumElements =
+s"""
+   | int $numElements = 0;
+   | if ($r > 0) {
+   |   $numElements = $r;
+   | }
+ """.stripMargin
+
+  val initialization = if (isPrimitive) {
+val arrayName = ctx.freshName("array")
+val baseOffset = Platform.BYTE_ARRAY_OFFSET
+s"""
+   | int numBytes = ${et.defaultSize} * $numElements;
+   | int unsafeArraySizeInBytes =
+   |   UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+   | + org.apache.spark.unsafe.array.ByteArrayMethods
+   |   .roundNumberOfBytesToNearestWord(numBytes);
+   | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
+   | UnsafeArrayData $arrayDataName = new UnsafeArrayData();
+   | Platform.putLong($arrayName, $baseOffset, $numElements);
+   | $arrayDataName.pointTo($arrayName, $baseOffset, 
unsafeArraySizeInBytes);
+   | ${ev.value} = $arrayDataName;
+ """.stripMargin
+  } else {
+s"${ev.value} = new ${classOf[GenericArrayData].getName()}(new 
Object[$numElements]);"
+  }
+
+  val primitiveValueTypeName = CodeGenerator.primitiveTypeName(et)
+  val assignments = {
+val updateArray = if (isPrimitive) {
+  val isNull = left.genCode(ctx).isNull
+  s"""
+ | if ($isNull) {
+ |   ${ev.value}.setNullAt(k);
+ | } else {
+ |   ${ev.value}.set$primitiveValueTypeName(k, $l);
+ | }
+   """.stripMargin
+}

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-11 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r187692007
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,140 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  new 
GenericArrayData(List.fill(count.asInstanceOf[Int])(left.eval(input)))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def nullSafeCodeGen(ctx: CodegenContext,
+   ev: ExprCode,
+   f: (String, String) => String): ExprCode = {
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val resultCode = f(leftGen.value, rightGen.value)
+
+if (nullable) {
+  val nullSafeEval =
+leftGen.code +
+  rightGen.code + ctx.nullSafeExec(right.nullable, 
rightGen.isNull) {
+s"""
+  ${ev.isNull} = false;
+  $resultCode
+"""
+  }
+
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = true;
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $nullSafeEval
+ """.stripMargin
+  )
+} else {
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = false;
+   | ${leftGen.code}
+   | ${rightGen.code}
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $resultCode
+ """.stripMargin
+, isNull = FalseLiteral)
+}
+
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+nullSafeCodeGen(ctx, ev, (l, r) => {
+  val et = dataType.elementType
+  val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+  val arrayDataName = ctx.freshName("arrayData")
+  val arrayName = ctx.freshName("arrayObject")
+  val numElements = ctx.freshName("numElements")
+
+  val genNumElements =
+s"""
+   | int $numElements = 0;
+   | if ($r > 0) {
+   |   $numElements = $r;
+   | }
+ """.stripMargin
+
+  val initialization = if (isPrimitive) {
+val arrayName = ctx.freshName("array")
+val baseOffset = Platform.BYTE_ARRAY_OFFSET
+s"""
+   | int numBytes = ${et.defaultSize} * $numElements;
+   | int unsafeArraySizeInBytes =
+   |   UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+   | + org.apache.spark.unsafe.array.ByteArrayMethods
+   |   .roundNumberOfBytesToNearestWord(numBytes);
+   | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
+   | UnsafeArrayData $arrayDataName = new UnsafeArrayData();
+   | Platform.putLong($arrayName, $baseOffset, $numElements);
+   | $arrayDataName.pointTo($arrayName, $baseOffset, 
unsafeArraySizeInBytes);
+   | ${ev.value} = $arrayDataName;
+ """.stripMargin
+  } else {
+s"${ev.value} = new ${classOf[GenericArrayData].getName()}(new 
Object[$numElements]);"
+  }
+
+  val primitiveValueTypeName = CodeGenerator.primitiveTypeName(et)
+  val assignments = {
+val updateArray = if (isPrimitive) {
+  val isNull = left.genCode(ctx).isNull
+  s"""
+ | if ($isNull) {
+ |   ${ev.value}.setNullAt(k);
+ | } else {
+ |   ${ev.value}.set$primitiveValueTypeName(k, $l);
+ | }
+   """.stripMargin
+}

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-07 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r186357798
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,140 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  new 
GenericArrayData(List.fill(count.asInstanceOf[Int])(left.eval(input)))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def nullSafeCodeGen(ctx: CodegenContext,
+   ev: ExprCode,
+   f: (String, String) => String): ExprCode = {
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val resultCode = f(leftGen.value, rightGen.value)
+
+if (nullable) {
+  val nullSafeEval =
+leftGen.code +
+  rightGen.code + ctx.nullSafeExec(right.nullable, 
rightGen.isNull) {
+s"""
+  ${ev.isNull} = false;
+  $resultCode
+"""
+  }
+
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = true;
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $nullSafeEval
+ """.stripMargin
+  )
+} else {
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = false;
+   | ${leftGen.code}
+   | ${rightGen.code}
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $resultCode
+ """.stripMargin
+, isNull = FalseLiteral)
+}
+
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+nullSafeCodeGen(ctx, ev, (l, r) => {
+  val et = dataType.elementType
+  val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+  val arrayDataName = ctx.freshName("arrayData")
+  val arrayName = ctx.freshName("arrayObject")
+  val numElements = ctx.freshName("numElements")
+
+  val genNumElements =
+s"""
+   | int $numElements = 0;
+   | if ($r > 0) {
+   |   $numElements = $r;
+   | }
+ """.stripMargin
+
+  val initialization = if (isPrimitive) {
+val arrayName = ctx.freshName("array")
+val baseOffset = Platform.BYTE_ARRAY_OFFSET
+s"""
+   | int numBytes = ${et.defaultSize} * $numElements;
+   | int unsafeArraySizeInBytes =
+   |   UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+   | + org.apache.spark.unsafe.array.ByteArrayMethods
+   |   .roundNumberOfBytesToNearestWord(numBytes);
+   | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
--- End diff --

Maybe we can use `ctx.createUnsafeArray()` now?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-07 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r186356981
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,140 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  new 
GenericArrayData(List.fill(count.asInstanceOf[Int])(left.eval(input)))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def nullSafeCodeGen(ctx: CodegenContext,
--- End diff --

Yes, overriding `nullSafeCodeGen` is not suitable for this usage.
So I think it would be good to put all code in `doGenCode`, or to create 
another method instead of overriding `nullSafeCodeGen`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-06 Thread pepinoflo

Github user pepinoflo commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r186292213
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,140 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  new 
GenericArrayData(List.fill(count.asInstanceOf[Int])(left.eval(input)))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def nullSafeCodeGen(ctx: CodegenContext,
--- End diff --

It was because if right is null we don't want to do code gen and return 
null instead. Do you think it would make more sense removing this one and put 
all the code in `doGenCode`? This would also fix the problem of doing 
`left.genCode.isNull` in the middle of `doGenCode`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-06 Thread pepinoflo

Github user pepinoflo commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r186292037
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,140 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  new 
GenericArrayData(List.fill(count.asInstanceOf[Int])(left.eval(input)))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def nullSafeCodeGen(ctx: CodegenContext,
+   ev: ExprCode,
+   f: (String, String) => String): ExprCode = {
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val resultCode = f(leftGen.value, rightGen.value)
+
+if (nullable) {
+  val nullSafeEval =
+leftGen.code +
+  rightGen.code + ctx.nullSafeExec(right.nullable, 
rightGen.isNull) {
+s"""
+  ${ev.isNull} = false;
+  $resultCode
+"""
+  }
+
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = true;
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $nullSafeEval
+ """.stripMargin
+  )
+} else {
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = false;
+   | ${leftGen.code}
+   | ${rightGen.code}
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $resultCode
+ """.stripMargin
+, isNull = FalseLiteral)
+}
+
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+nullSafeCodeGen(ctx, ev, (l, r) => {
+  val et = dataType.elementType
+  val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+  val arrayDataName = ctx.freshName("arrayData")
+  val arrayName = ctx.freshName("arrayObject")
+  val numElements = ctx.freshName("numElements")
+
+  val genNumElements =
+s"""
+   | int $numElements = 0;
+   | if ($r > 0) {
+   |   $numElements = $r;
+   | }
+ """.stripMargin
+
+  val initialization = if (isPrimitive) {
+val arrayName = ctx.freshName("array")
+val baseOffset = Platform.BYTE_ARRAY_OFFSET
+s"""
+   | int numBytes = ${et.defaultSize} * $numElements;
+   | int unsafeArraySizeInBytes =
+   |   UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+   | + org.apache.spark.unsafe.array.ByteArrayMethods
+   |   .roundNumberOfBytesToNearestWord(numBytes);
+   | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
--- End diff --

So you mean, we should do a size check to make sure it fits in the array, 
and if it doesn't we should do boxing and initialize a `GenericArrayData` 
instead with the given size?
Also when you say `0x7000_`, do you mean the hex value?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-04 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r186096582
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,140 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  new 
GenericArrayData(List.fill(count.asInstanceOf[Int])(left.eval(input)))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def nullSafeCodeGen(ctx: CodegenContext,
+   ev: ExprCode,
+   f: (String, String) => String): ExprCode = {
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val resultCode = f(leftGen.value, rightGen.value)
+
+if (nullable) {
+  val nullSafeEval =
+leftGen.code +
+  rightGen.code + ctx.nullSafeExec(right.nullable, 
rightGen.isNull) {
+s"""
+  ${ev.isNull} = false;
+  $resultCode
+"""
+  }
+
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = true;
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $nullSafeEval
+ """.stripMargin
+  )
+} else {
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = false;
+   | ${leftGen.code}
+   | ${rightGen.code}
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $resultCode
+ """.stripMargin
+, isNull = FalseLiteral)
+}
+
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+nullSafeCodeGen(ctx, ev, (l, r) => {
+  val et = dataType.elementType
+  val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+  val arrayDataName = ctx.freshName("arrayData")
+  val arrayName = ctx.freshName("arrayObject")
+  val numElements = ctx.freshName("numElements")
+
+  val genNumElements =
+s"""
+   | int $numElements = 0;
+   | if ($r > 0) {
+   |   $numElements = $r;
+   | }
+ """.stripMargin
+
+  val initialization = if (isPrimitive) {
+val arrayName = ctx.freshName("array")
+val baseOffset = Platform.BYTE_ARRAY_OFFSET
+s"""
+   | int numBytes = ${et.defaultSize} * $numElements;
+   | int unsafeArraySizeInBytes =
+   |   UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+   | + org.apache.spark.unsafe.array.ByteArrayMethods
+   |   .roundNumberOfBytesToNearestWord(numBytes);
+   | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
+   | UnsafeArrayData $arrayDataName = new UnsafeArrayData();
+   | Platform.putLong($arrayName, $baseOffset, $numElements);
+   | $arrayDataName.pointTo($arrayName, $baseOffset, 
unsafeArraySizeInBytes);
+   | ${ev.value} = $arrayDataName;
+ """.stripMargin
+  } else {
+s"${ev.value} = new ${classOf[GenericArrayData].getName()}(new 
Object[$numElements]);"
+  }
+
+  val primitiveValueTypeName = CodeGenerator.primitiveTypeName(et)
+  val assignments = {
+val updateArray = if (isPrimitive) {
+  val isNull = left.genCode(ctx).isNull
+  s"""
+ | if ($isNull) {
+ |   ${ev.value}.setNullAt(k);
+ | } else {
+ |   ${ev.value}.set$primitiveValueTypeName(k, $l);
+ | }
+   """.stripMargin
+}

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-04 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r186020638
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,140 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  new 
GenericArrayData(List.fill(count.asInstanceOf[Int])(left.eval(input)))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def nullSafeCodeGen(ctx: CodegenContext,
+   ev: ExprCode,
+   f: (String, String) => String): ExprCode = {
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val resultCode = f(leftGen.value, rightGen.value)
+
+if (nullable) {
+  val nullSafeEval =
+leftGen.code +
+  rightGen.code + ctx.nullSafeExec(right.nullable, 
rightGen.isNull) {
+s"""
+  ${ev.isNull} = false;
+  $resultCode
+"""
+  }
+
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = true;
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $nullSafeEval
+ """.stripMargin
+  )
+} else {
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = false;
+   | ${leftGen.code}
+   | ${rightGen.code}
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $resultCode
+ """.stripMargin
+, isNull = FalseLiteral)
+}
+
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+nullSafeCodeGen(ctx, ev, (l, r) => {
+  val et = dataType.elementType
+  val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+  val arrayDataName = ctx.freshName("arrayData")
+  val arrayName = ctx.freshName("arrayObject")
+  val numElements = ctx.freshName("numElements")
+
+  val genNumElements =
+s"""
+   | int $numElements = 0;
+   | if ($r > 0) {
+   |   $numElements = $r;
+   | }
+ """.stripMargin
+
+  val initialization = if (isPrimitive) {
+val arrayName = ctx.freshName("array")
+val baseOffset = Platform.BYTE_ARRAY_OFFSET
+s"""
+   | int numBytes = ${et.defaultSize} * $numElements;
+   | int unsafeArraySizeInBytes =
+   |   UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+   | + org.apache.spark.unsafe.array.ByteArrayMethods
+   |   .roundNumberOfBytesToNearestWord(numBytes);
+   | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
--- End diff --

Do we throw an exception for the large number of elements with wider 
elements? For example, `0x7000_` long elements. I think that 
`GenericArrayData` can hold these elements.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-04 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r186020180
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,132 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  new 
GenericArrayData(List.fill(count.asInstanceOf[Int])(left.eval(input)))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def nullSafeCodeGen(ctx: CodegenContext,
+   ev: ExprCode,
+   f: (String, String) => String): ExprCode = {
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val resultCode = f(leftGen.value, rightGen.value)
+
+if (nullable) {
+  val nullSafeEval =
+leftGen.code +
+  rightGen.code + ctx.nullSafeExec(right.nullable, 
rightGen.isNull) {
+s"""
+  ${ev.isNull} = false;
+  $resultCode
+"""
+  }
+
+  ev.copy(code = s"""
+boolean ${ev.isNull} = true;
+${CodeGenerator.javaType(dataType)} ${ev.value} = 
${CodeGenerator.defaultValue(dataType)};
+$nullSafeEval
+  """)
+} else {
+  ev.copy(code = s"""
+boolean ${ev.isNull} = false;
+${leftGen.code}
+${rightGen.code}
+${CodeGenerator.javaType(dataType)} ${ev.value} = 
${CodeGenerator.defaultValue(dataType)};
+$resultCode""", isNull = FalseLiteral)
+}
+
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+nullSafeCodeGen(ctx, ev, (l, r) => {
+  val et = dataType.elementType
+  val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+  val arrayDataName = ctx.freshName("arrayData")
+  val arrayName = ctx.freshName("arrayObject")
+  val numElements = ctx.freshName("numElements")
+
+  val genNumElements =
+s"""
+   | int $numElements = 0;
+   | if ($r > 0) {
+   |   $numElements = $r;
+   | }
+ """.stripMargin
+
+  val initialization = if (isPrimitive) {
+val arrayName = ctx.freshName("array")
+val baseOffset = Platform.BYTE_ARRAY_OFFSET
+s"""
+   | int numBytes = ${et.defaultSize} * $numElements;
+   | int unsafeArraySizeInBytes = 
UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+   |   + org.apache.spark.unsafe.array.ByteArrayMethods
+   | .roundNumberOfBytesToNearestWord(numBytes);
+   | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
+   | UnsafeArrayData $arrayDataName = new UnsafeArrayData();
+   | Platform.putLong($arrayName, $baseOffset, $numElements);
+   | $arrayDataName.pointTo($arrayName, $baseOffset, 
unsafeArraySizeInBytes);
+   | ${ev.value} = $arrayDataName;
+ """.stripMargin
+  } else {
+s"${ev.value} = new ${classOf[GenericArrayData].getName()}(new 
Object[$numElements]);"
+  }
+
+  val primitiveValueTypeName = CodeGenerator.primitiveTypeName(et)
+  val assignments = {
+val updateArray = if (isPrimitive) {
+  val isNull = left.genCode(ctx).isNull
+  s"""
+ | if ($isNull) {
--- End diff --

Ah, sorry for my misunderstanding. `isPrimitive` does not stand for 
`element is non-null`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-04 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r186020256
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,140 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  new 
GenericArrayData(List.fill(count.asInstanceOf[Int])(left.eval(input)))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def nullSafeCodeGen(ctx: CodegenContext,
+   ev: ExprCode,
+   f: (String, String) => String): ExprCode = {
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val resultCode = f(leftGen.value, rightGen.value)
+
+if (nullable) {
+  val nullSafeEval =
+leftGen.code +
+  rightGen.code + ctx.nullSafeExec(right.nullable, 
rightGen.isNull) {
+s"""
+  ${ev.isNull} = false;
+  $resultCode
+"""
+  }
+
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = true;
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $nullSafeEval
+ """.stripMargin
+  )
+} else {
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = false;
+   | ${leftGen.code}
+   | ${rightGen.code}
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $resultCode
+ """.stripMargin
+, isNull = FalseLiteral)
+}
+
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+nullSafeCodeGen(ctx, ev, (l, r) => {
+  val et = dataType.elementType
+  val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+  val arrayDataName = ctx.freshName("arrayData")
+  val arrayName = ctx.freshName("arrayObject")
+  val numElements = ctx.freshName("numElements")
+
+  val genNumElements =
+s"""
+   | int $numElements = 0;
+   | if ($r > 0) {
+   |   $numElements = $r;
+   | }
+ """.stripMargin
+
+  val initialization = if (isPrimitive) {
+val arrayName = ctx.freshName("array")
+val baseOffset = Platform.BYTE_ARRAY_OFFSET
+s"""
+   | int numBytes = ${et.defaultSize} * $numElements;
+   | int unsafeArraySizeInBytes =
+   |   UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+   | + org.apache.spark.unsafe.array.ByteArrayMethods
+   |   .roundNumberOfBytesToNearestWord(numBytes);
+   | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
+   | UnsafeArrayData $arrayDataName = new UnsafeArrayData();
+   | Platform.putLong($arrayName, $baseOffset, $numElements);
+   | $arrayDataName.pointTo($arrayName, $baseOffset, 
unsafeArraySizeInBytes);
+   | ${ev.value} = $arrayDataName;
+ """.stripMargin
+  } else {
+s"${ev.value} = new ${classOf[GenericArrayData].getName()}(new 
Object[$numElements]);"
+  }
+
+  val primitiveValueTypeName = CodeGenerator.primitiveTypeName(et)
+  val assignments = {
+val updateArray = if (isPrimitive) {
+  val isNull = left.genCode(ctx).isNull
+  s"""
+ | if ($isNull) {
+ |   ${ev.value}.setNullAt(k);
+ | } else {
+ |   ${ev.value}.set$primitiveValueTypeName(k, $l);
+ | }
+   """.stripMargin
+}

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-04 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r186014334
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -798,6 +798,156 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  test("array_repeat function") {
+val dummyFilter = (c: Column) => c.isNull || c.isNotNull // to switch 
codeGen on
+val strDF = Seq(
+("hi", 1),
+(null, 2)
+).toDF("a", "b")
+
+checkAnswer(
+  strDF.select(array_repeat(strDF("a"), 0)),
--- End diff --

nit: maybe `$"a"` form is preferred to `df("a")`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-04 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r186016026
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,140 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  new 
GenericArrayData(List.fill(count.asInstanceOf[Int])(left.eval(input)))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def nullSafeCodeGen(ctx: CodegenContext,
+   ev: ExprCode,
+   f: (String, String) => String): ExprCode = {
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val resultCode = f(leftGen.value, rightGen.value)
+
+if (nullable) {
+  val nullSafeEval =
+leftGen.code +
+  rightGen.code + ctx.nullSafeExec(right.nullable, 
rightGen.isNull) {
+s"""
+  ${ev.isNull} = false;
+  $resultCode
+"""
+  }
+
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = true;
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $nullSafeEval
+ """.stripMargin
+  )
+} else {
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = false;
+   | ${leftGen.code}
+   | ${rightGen.code}
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $resultCode
+ """.stripMargin
+, isNull = FalseLiteral)
+}
+
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+nullSafeCodeGen(ctx, ev, (l, r) => {
+  val et = dataType.elementType
+  val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+  val arrayDataName = ctx.freshName("arrayData")
+  val arrayName = ctx.freshName("arrayObject")
+  val numElements = ctx.freshName("numElements")
+
+  val genNumElements =
+s"""
+   | int $numElements = 0;
+   | if ($r > 0) {
+   |   $numElements = $r;
+   | }
+ """.stripMargin
+
+  val initialization = if (isPrimitive) {
+val arrayName = ctx.freshName("array")
+val baseOffset = Platform.BYTE_ARRAY_OFFSET
+s"""
+   | int numBytes = ${et.defaultSize} * $numElements;
+   | int unsafeArraySizeInBytes =
+   |   UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+   | + org.apache.spark.unsafe.array.ByteArrayMethods
+   |   .roundNumberOfBytesToNearestWord(numBytes);
+   | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
+   | UnsafeArrayData $arrayDataName = new UnsafeArrayData();
+   | Platform.putLong($arrayName, $baseOffset, $numElements);
+   | $arrayDataName.pointTo($arrayName, $baseOffset, 
unsafeArraySizeInBytes);
+   | ${ev.value} = $arrayDataName;
+ """.stripMargin
+  } else {
+s"${ev.value} = new ${classOf[GenericArrayData].getName()}(new 
Object[$numElements]);"
+  }
+
+  val primitiveValueTypeName = CodeGenerator.primitiveTypeName(et)
--- End diff --

nit: we can move this into the following if case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-04 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r186012320
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,140 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  new 
GenericArrayData(List.fill(count.asInstanceOf[Int])(left.eval(input)))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def nullSafeCodeGen(ctx: CodegenContext,
+   ev: ExprCode,
+   f: (String, String) => String): ExprCode = {
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val resultCode = f(leftGen.value, rightGen.value)
+
+if (nullable) {
+  val nullSafeEval =
+leftGen.code +
+  rightGen.code + ctx.nullSafeExec(right.nullable, 
rightGen.isNull) {
+s"""
+  ${ev.isNull} = false;
+  $resultCode
+"""
+  }
+
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = true;
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $nullSafeEval
+ """.stripMargin
+  )
+} else {
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = false;
+   | ${leftGen.code}
+   | ${rightGen.code}
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $resultCode
+ """.stripMargin
+, isNull = FalseLiteral)
+}
+
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+nullSafeCodeGen(ctx, ev, (l, r) => {
+  val et = dataType.elementType
+  val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+  val arrayDataName = ctx.freshName("arrayData")
+  val arrayName = ctx.freshName("arrayObject")
+  val numElements = ctx.freshName("numElements")
+
+  val genNumElements =
+s"""
+   | int $numElements = 0;
+   | if ($r > 0) {
+   |   $numElements = $r;
+   | }
+ """.stripMargin
+
+  val initialization = if (isPrimitive) {
+val arrayName = ctx.freshName("array")
+val baseOffset = Platform.BYTE_ARRAY_OFFSET
+s"""
+   | int numBytes = ${et.defaultSize} * $numElements;
+   | int unsafeArraySizeInBytes =
+   |   UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+   | + org.apache.spark.unsafe.array.ByteArrayMethods
+   |   .roundNumberOfBytesToNearestWord(numBytes);
+   | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
+   | UnsafeArrayData $arrayDataName = new UnsafeArrayData();
+   | Platform.putLong($arrayName, $baseOffset, $numElements);
+   | $arrayDataName.pointTo($arrayName, $baseOffset, 
unsafeArraySizeInBytes);
+   | ${ev.value} = $arrayDataName;
+ """.stripMargin
+  } else {
+s"${ev.value} = new ${classOf[GenericArrayData].getName()}(new 
Object[$numElements]);"
+  }
+
+  val primitiveValueTypeName = CodeGenerator.primitiveTypeName(et)
+  val assignments = {
+val updateArray = if (isPrimitive) {
+  val isNull = left.genCode(ctx).isNull
+  s"""
+ | if ($isNull) {
+ |   ${ev.value}.setNullAt(k);
+ | } else {
+ |   ${ev.value}.set$primitiveValueTypeName(k, $l);
+ | }
+   """.stripMargin
+}

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-04 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r186008236
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,140 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  new 
GenericArrayData(List.fill(count.asInstanceOf[Int])(left.eval(input)))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def nullSafeCodeGen(ctx: CodegenContext,
--- End diff --

Why override this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-04 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r186013623
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -798,6 +798,156 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  test("array_repeat function") {
--- End diff --

We don't need so many cases here. We only need to verify the api works end 
to end.
Evaluation checks of the function should be in `CollectionExpressionsSuite`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-04 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r186007739
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,140 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  new 
GenericArrayData(List.fill(count.asInstanceOf[Int])(left.eval(input)))
--- End diff --

We should evaluate `left.eval(input)` before `List.fill()()` because 
`List.fill()()` evaluates the second argument every time.

Btw, `Array.fill()()` instead of `List.fill()()`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-04 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r186012529
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,140 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  new 
GenericArrayData(List.fill(count.asInstanceOf[Int])(left.eval(input)))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def nullSafeCodeGen(ctx: CodegenContext,
+   ev: ExprCode,
+   f: (String, String) => String): ExprCode = {
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val resultCode = f(leftGen.value, rightGen.value)
+
+if (nullable) {
+  val nullSafeEval =
+leftGen.code +
+  rightGen.code + ctx.nullSafeExec(right.nullable, 
rightGen.isNull) {
+s"""
+  ${ev.isNull} = false;
+  $resultCode
+"""
+  }
+
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = true;
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $nullSafeEval
+ """.stripMargin
+  )
+} else {
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = false;
+   | ${leftGen.code}
+   | ${rightGen.code}
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $resultCode
+ """.stripMargin
+, isNull = FalseLiteral)
+}
+
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+nullSafeCodeGen(ctx, ev, (l, r) => {
+  val et = dataType.elementType
+  val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+  val arrayDataName = ctx.freshName("arrayData")
+  val arrayName = ctx.freshName("arrayObject")
+  val numElements = ctx.freshName("numElements")
+
+  val genNumElements =
+s"""
+   | int $numElements = 0;
+   | if ($r > 0) {
+   |   $numElements = $r;
+   | }
+ """.stripMargin
+
+  val initialization = if (isPrimitive) {
+val arrayName = ctx.freshName("array")
+val baseOffset = Platform.BYTE_ARRAY_OFFSET
+s"""
+   | int numBytes = ${et.defaultSize} * $numElements;
+   | int unsafeArraySizeInBytes =
+   |   UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+   | + org.apache.spark.unsafe.array.ByteArrayMethods
+   |   .roundNumberOfBytesToNearestWord(numBytes);
+   | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
+   | UnsafeArrayData $arrayDataName = new UnsafeArrayData();
+   | Platform.putLong($arrayName, $baseOffset, $numElements);
+   | $arrayDataName.pointTo($arrayName, $baseOffset, 
unsafeArraySizeInBytes);
+   | ${ev.value} = $arrayDataName;
+ """.stripMargin
+  } else {
+s"${ev.value} = new ${classOf[GenericArrayData].getName()}(new 
Object[$numElements]);"
+  }
+
+  val primitiveValueTypeName = CodeGenerator.primitiveTypeName(et)
+  val assignments = {
+val updateArray = if (isPrimitive) {
+  val isNull = left.genCode(ctx).isNull
--- End diff --

We shouldn't do `genCode` here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-04 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r186008908
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,140 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  new 
GenericArrayData(List.fill(count.asInstanceOf[Int])(left.eval(input)))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def nullSafeCodeGen(ctx: CodegenContext,
+   ev: ExprCode,
+   f: (String, String) => String): ExprCode = {
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val resultCode = f(leftGen.value, rightGen.value)
+
+if (nullable) {
+  val nullSafeEval =
+leftGen.code +
+  rightGen.code + ctx.nullSafeExec(right.nullable, 
rightGen.isNull) {
+s"""
+  ${ev.isNull} = false;
+  $resultCode
+"""
+  }
+
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = true;
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $nullSafeEval
+ """.stripMargin
+  )
+} else {
+  ev.copy(code =
+s"""
+   | boolean ${ev.isNull} = false;
+   | ${leftGen.code}
+   | ${rightGen.code}
+   | ${CodeGenerator.javaType(dataType)} ${ev.value} =
+   |   ${CodeGenerator.defaultValue(dataType)};
+   | $resultCode
+ """.stripMargin
+, isNull = FalseLiteral)
+}
+
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+nullSafeCodeGen(ctx, ev, (l, r) => {
+  val et = dataType.elementType
+  val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+  val arrayDataName = ctx.freshName("arrayData")
+  val arrayName = ctx.freshName("arrayObject")
+  val numElements = ctx.freshName("numElements")
+
+  val genNumElements =
+s"""
+   | int $numElements = 0;
--- End diff --

nit: we usually don't add a space between `|` and the following sentence if 
the indent is not needed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-03 Thread pepinoflo

Github user pepinoflo commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r185972208
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,98 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val expected = IntegerType
+if (!expected.acceptsType(right.dataType)) {
+  val mismatch = s"argument 2 requires ${expected.simpleString} type, 
" +
+s"however, '${right.sql}' is of ${right.dataType.simpleString} 
type."
+  TypeCheckResult.TypeCheckFailure(mismatch)
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def nullable: Boolean = false
+
+  override def eval(input: InternalRow): Any = {
+new 
GenericArrayData(List.fill(right.eval(input).asInstanceOf[Integer])(left.eval(input)))
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val element = leftGen.value
+val count = rightGen.value
+val et = dataType.elementType
+val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+val arrayDataName = ctx.freshName("arrayData")
+val arrayName = ctx.freshName("arrayObject")
+val initialization = (numElements: String) => if (isPrimitive) {
+  val arrayName = ctx.freshName("array")
+  val baseOffset = Platform.BYTE_ARRAY_OFFSET
+  s"""
+ | int numBytes = ${et.defaultSize} * $numElements;
+ | int unsafeArraySizeInBytes = 
UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+ |   + org.apache.spark.unsafe.array.ByteArrayMethods
+ | .roundNumberOfBytesToNearestWord(numBytes);
+ | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
--- End diff --

Would this not just throw an exception anyway if unsafeArraySizeInBytes is 
greater than 2^31-1?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-03 Thread pepinoflo

Github user pepinoflo commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r185971775
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,132 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  new 
GenericArrayData(List.fill(count.asInstanceOf[Int])(left.eval(input)))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def nullSafeCodeGen(ctx: CodegenContext,
+   ev: ExprCode,
+   f: (String, String) => String): ExprCode = {
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val resultCode = f(leftGen.value, rightGen.value)
+
+if (nullable) {
+  val nullSafeEval =
+leftGen.code +
+  rightGen.code + ctx.nullSafeExec(right.nullable, 
rightGen.isNull) {
+s"""
+  ${ev.isNull} = false;
+  $resultCode
+"""
+  }
+
+  ev.copy(code = s"""
+boolean ${ev.isNull} = true;
+${CodeGenerator.javaType(dataType)} ${ev.value} = 
${CodeGenerator.defaultValue(dataType)};
+$nullSafeEval
+  """)
+} else {
+  ev.copy(code = s"""
+boolean ${ev.isNull} = false;
+${leftGen.code}
+${rightGen.code}
+${CodeGenerator.javaType(dataType)} ${ev.value} = 
${CodeGenerator.defaultValue(dataType)};
+$resultCode""", isNull = FalseLiteral)
+}
+
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+nullSafeCodeGen(ctx, ev, (l, r) => {
+  val et = dataType.elementType
+  val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+  val arrayDataName = ctx.freshName("arrayData")
+  val arrayName = ctx.freshName("arrayObject")
+  val numElements = ctx.freshName("numElements")
+
+  val genNumElements =
+s"""
+   | int $numElements = 0;
+   | if ($r > 0) {
+   |   $numElements = $r;
+   | }
+ """.stripMargin
+
+  val initialization = if (isPrimitive) {
+val arrayName = ctx.freshName("array")
+val baseOffset = Platform.BYTE_ARRAY_OFFSET
+s"""
+   | int numBytes = ${et.defaultSize} * $numElements;
+   | int unsafeArraySizeInBytes = 
UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+   |   + org.apache.spark.unsafe.array.ByteArrayMethods
+   | .roundNumberOfBytesToNearestWord(numBytes);
+   | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
+   | UnsafeArrayData $arrayDataName = new UnsafeArrayData();
+   | Platform.putLong($arrayName, $baseOffset, $numElements);
+   | $arrayDataName.pointTo($arrayName, $baseOffset, 
unsafeArraySizeInBytes);
+   | ${ev.value} = $arrayDataName;
+ """.stripMargin
+  } else {
+s"${ev.value} = new ${classOf[GenericArrayData].getName()}(new 
Object[$numElements]);"
+  }
+
+  val primitiveValueTypeName = CodeGenerator.primitiveTypeName(et)
+  val assignments = {
+val updateArray = if (isPrimitive) {
+  val isNull = left.genCode(ctx).isNull
+  s"""
+ | if ($isNull) {
--- End diff --

I thought it could be, in this case?
```
val intDF = {
  val schema = StructType(Seq(
StructField("a", IntegerType),
StructField("b", IntegerType)))
  val data = Seq(
Row(1, 1),
Row(3, 2),
Row(null, 2)
  )
  spark.createDataFrame(spark.sparkContext.parallelize(data), schema)

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-03 Thread pepinoflo

Github user pepinoflo commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r185966094
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,98 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val expected = IntegerType
+if (!expected.acceptsType(right.dataType)) {
+  val mismatch = s"argument 2 requires ${expected.simpleString} type, 
" +
+s"however, '${right.sql}' is of ${right.dataType.simpleString} 
type."
+  TypeCheckResult.TypeCheckFailure(mismatch)
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def nullable: Boolean = false
+
+  override def eval(input: InternalRow): Any = {
+new 
GenericArrayData(List.fill(right.eval(input).asInstanceOf[Integer])(left.eval(input)))
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val element = leftGen.value
+val count = rightGen.value
+val et = dataType.elementType
+val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+val arrayDataName = ctx.freshName("arrayData")
+val arrayName = ctx.freshName("arrayObject")
+val initialization = (numElements: String) => if (isPrimitive) {
+  val arrayName = ctx.freshName("array")
+  val baseOffset = Platform.BYTE_ARRAY_OFFSET
+  s"""
+ | int numBytes = ${et.defaultSize} * $numElements;
+ | int unsafeArraySizeInBytes = 
UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+ |   + org.apache.spark.unsafe.array.ByteArrayMethods
+ | .roundNumberOfBytesToNearestWord(numBytes);
+ | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
+ | UnsafeArrayData $arrayDataName = new UnsafeArrayData();
+ | Platform.putLong($arrayName, $baseOffset, $numElements);
+ | $arrayDataName.pointTo($arrayName, $baseOffset, 
unsafeArraySizeInBytes);
+ | ${ev.value} = $arrayDataName;
+   """.stripMargin
+} else {
+  s"${ev.value} = new ${classOf[GenericArrayData].getName()}(new 
Object[$numElements]);"
+}
+
+val primitiveValueTypeName = CodeGenerator.primitiveTypeName(et)
+val assignments = {
+  val updateArray = if (isPrimitive) {
+s"${ev.value}.set$primitiveValueTypeName(k, $element);"
+  } else {
+s"${ev.value}.update(k, $element);"
+  }
+  s"""
+ | for (int k = 0; k < $count; k++) {
+ |   ${updateArray};
+ | }
+   """.stripMargin
+}
+
+val resultCode = s"""
+| if ($count < 0) {
+|   ${initialization("0")}
+| } else {
+|   ${initialization(count)}
+| }
+| ${assignments}
+  """.stripMargin
+
+ev.copy(code = s"""
--- End diff --

Yes sure, updated in commit 7a8610f


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-03 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r185962223
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,132 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
+
+  override def nullable: Boolean = right.nullable
+
+  override def eval(input: InternalRow): Any = {
+val count = right.eval(input)
+if (count == null) {
+  null
+} else {
+  new 
GenericArrayData(List.fill(count.asInstanceOf[Int])(left.eval(input)))
+}
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def nullSafeCodeGen(ctx: CodegenContext,
+   ev: ExprCode,
+   f: (String, String) => String): ExprCode = {
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val resultCode = f(leftGen.value, rightGen.value)
+
+if (nullable) {
+  val nullSafeEval =
+leftGen.code +
+  rightGen.code + ctx.nullSafeExec(right.nullable, 
rightGen.isNull) {
+s"""
+  ${ev.isNull} = false;
+  $resultCode
+"""
+  }
+
+  ev.copy(code = s"""
+boolean ${ev.isNull} = true;
+${CodeGenerator.javaType(dataType)} ${ev.value} = 
${CodeGenerator.defaultValue(dataType)};
+$nullSafeEval
+  """)
+} else {
+  ev.copy(code = s"""
+boolean ${ev.isNull} = false;
+${leftGen.code}
+${rightGen.code}
+${CodeGenerator.javaType(dataType)} ${ev.value} = 
${CodeGenerator.defaultValue(dataType)};
+$resultCode""", isNull = FalseLiteral)
+}
+
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+nullSafeCodeGen(ctx, ev, (l, r) => {
+  val et = dataType.elementType
+  val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+  val arrayDataName = ctx.freshName("arrayData")
+  val arrayName = ctx.freshName("arrayObject")
+  val numElements = ctx.freshName("numElements")
+
+  val genNumElements =
+s"""
+   | int $numElements = 0;
+   | if ($r > 0) {
+   |   $numElements = $r;
+   | }
+ """.stripMargin
+
+  val initialization = if (isPrimitive) {
+val arrayName = ctx.freshName("array")
+val baseOffset = Platform.BYTE_ARRAY_OFFSET
+s"""
+   | int numBytes = ${et.defaultSize} * $numElements;
+   | int unsafeArraySizeInBytes = 
UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+   |   + org.apache.spark.unsafe.array.ByteArrayMethods
+   | .roundNumberOfBytesToNearestWord(numBytes);
+   | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
+   | UnsafeArrayData $arrayDataName = new UnsafeArrayData();
+   | Platform.putLong($arrayName, $baseOffset, $numElements);
+   | $arrayDataName.pointTo($arrayName, $baseOffset, 
unsafeArraySizeInBytes);
+   | ${ev.value} = $arrayDataName;
+ """.stripMargin
+  } else {
+s"${ev.value} = new ${classOf[GenericArrayData].getName()}(new 
Object[$numElements]);"
+  }
+
+  val primitiveValueTypeName = CodeGenerator.primitiveTypeName(et)
+  val assignments = {
+val updateArray = if (isPrimitive) {
+  val isNull = left.genCode(ctx).isNull
+  s"""
+ | if ($isNull) {
--- End diff --

IIUC, there is no `null` element when `isPrimitive` is true.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-03 Thread pepinoflo

Github user pepinoflo commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r185962048
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,98 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val expected = IntegerType
+if (!expected.acceptsType(right.dataType)) {
+  val mismatch = s"argument 2 requires ${expected.simpleString} type, 
" +
+s"however, '${right.sql}' is of ${right.dataType.simpleString} 
type."
+  TypeCheckResult.TypeCheckFailure(mismatch)
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def nullable: Boolean = false
+
+  override def eval(input: InternalRow): Any = {
+new 
GenericArrayData(List.fill(right.eval(input).asInstanceOf[Integer])(left.eval(input)))
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val element = leftGen.value
+val count = rightGen.value
+val et = dataType.elementType
+val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+val arrayDataName = ctx.freshName("arrayData")
+val arrayName = ctx.freshName("arrayObject")
+val initialization = (numElements: String) => if (isPrimitive) {
+  val arrayName = ctx.freshName("array")
+  val baseOffset = Platform.BYTE_ARRAY_OFFSET
+  s"""
+ | int numBytes = ${et.defaultSize} * $numElements;
+ | int unsafeArraySizeInBytes = 
UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+ |   + org.apache.spark.unsafe.array.ByteArrayMethods
+ | .roundNumberOfBytesToNearestWord(numBytes);
+ | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
+ | UnsafeArrayData $arrayDataName = new UnsafeArrayData();
+ | Platform.putLong($arrayName, $baseOffset, $numElements);
+ | $arrayDataName.pointTo($arrayName, $baseOffset, 
unsafeArraySizeInBytes);
+ | ${ev.value} = $arrayDataName;
+   """.stripMargin
+} else {
+  s"${ev.value} = new ${classOf[GenericArrayData].getName()}(new 
Object[$numElements]);"
+}
+
+val primitiveValueTypeName = CodeGenerator.primitiveTypeName(et)
+val assignments = {
+  val updateArray = if (isPrimitive) {
+s"${ev.value}.set$primitiveValueTypeName(k, $element);"
+  } else {
+s"${ev.value}.update(k, $element);"
--- End diff --

Then we put null in the array... I think the update method handle that?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-03 Thread pepinoflo

Github user pepinoflo commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r185958227
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,98 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression {
--- End diff --

Good point, changed in commit 6e699e0


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-03 Thread pepinoflo

Github user pepinoflo commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r185958060
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,98 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val expected = IntegerType
+if (!expected.acceptsType(right.dataType)) {
+  val mismatch = s"argument 2 requires ${expected.simpleString} type, 
" +
+s"however, '${right.sql}' is of ${right.dataType.simpleString} 
type."
+  TypeCheckResult.TypeCheckFailure(mismatch)
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def nullable: Boolean = false
+
+  override def eval(input: InternalRow): Any = {
+new 
GenericArrayData(List.fill(right.eval(input).asInstanceOf[Integer])(left.eval(input)))
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val element = leftGen.value
+val count = rightGen.value
+val et = dataType.elementType
+val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+val arrayDataName = ctx.freshName("arrayData")
+val arrayName = ctx.freshName("arrayObject")
+val initialization = (numElements: String) => if (isPrimitive) {
+  val arrayName = ctx.freshName("array")
+  val baseOffset = Platform.BYTE_ARRAY_OFFSET
+  s"""
+ | int numBytes = ${et.defaultSize} * $numElements;
+ | int unsafeArraySizeInBytes = 
UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+ |   + org.apache.spark.unsafe.array.ByteArrayMethods
+ | .roundNumberOfBytesToNearestWord(numBytes);
+ | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
+ | UnsafeArrayData $arrayDataName = new UnsafeArrayData();
+ | Platform.putLong($arrayName, $baseOffset, $numElements);
+ | $arrayDataName.pointTo($arrayName, $baseOffset, 
unsafeArraySizeInBytes);
+ | ${ev.value} = $arrayDataName;
+   """.stripMargin
+} else {
+  s"${ev.value} = new ${classOf[GenericArrayData].getName()}(new 
Object[$numElements]);"
+}
+
+val primitiveValueTypeName = CodeGenerator.primitiveTypeName(et)
+val assignments = {
+  val updateArray = if (isPrimitive) {
+s"${ev.value}.set$primitiveValueTypeName(k, $element);"
--- End diff --

Fixed in 596a54f


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-03 Thread pepinoflo

Github user pepinoflo commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r185958088
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,98 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val expected = IntegerType
+if (!expected.acceptsType(right.dataType)) {
+  val mismatch = s"argument 2 requires ${expected.simpleString} type, 
" +
+s"however, '${right.sql}' is of ${right.dataType.simpleString} 
type."
+  TypeCheckResult.TypeCheckFailure(mismatch)
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def nullable: Boolean = false
--- End diff --

Fixed in 596a54f


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-03 Thread pepinoflo

Github user pepinoflo commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r185958070
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,98 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val expected = IntegerType
+if (!expected.acceptsType(right.dataType)) {
+  val mismatch = s"argument 2 requires ${expected.simpleString} type, 
" +
+s"however, '${right.sql}' is of ${right.dataType.simpleString} 
type."
+  TypeCheckResult.TypeCheckFailure(mismatch)
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def nullable: Boolean = false
+
+  override def eval(input: InternalRow): Any = {
+new 
GenericArrayData(List.fill(right.eval(input).asInstanceOf[Integer])(left.eval(input)))
--- End diff --

Fixed in 596a54f


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-03 Thread pepinoflo

Github user pepinoflo commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r185958049
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,98 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val expected = IntegerType
+if (!expected.acceptsType(right.dataType)) {
+  val mismatch = s"argument 2 requires ${expected.simpleString} type, 
" +
+s"however, '${right.sql}' is of ${right.dataType.simpleString} 
type."
+  TypeCheckResult.TypeCheckFailure(mismatch)
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def nullable: Boolean = false
+
+  override def eval(input: InternalRow): Any = {
+new 
GenericArrayData(List.fill(right.eval(input).asInstanceOf[Integer])(left.eval(input)))
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val element = leftGen.value
+val count = rightGen.value
+val et = dataType.elementType
+val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+val arrayDataName = ctx.freshName("arrayData")
+val arrayName = ctx.freshName("arrayObject")
+val initialization = (numElements: String) => if (isPrimitive) {
+  val arrayName = ctx.freshName("array")
+  val baseOffset = Platform.BYTE_ARRAY_OFFSET
+  s"""
+ | int numBytes = ${et.defaultSize} * $numElements;
+ | int unsafeArraySizeInBytes = 
UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+ |   + org.apache.spark.unsafe.array.ByteArrayMethods
+ | .roundNumberOfBytesToNearestWord(numBytes);
+ | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
+ | UnsafeArrayData $arrayDataName = new UnsafeArrayData();
+ | Platform.putLong($arrayName, $baseOffset, $numElements);
+ | $arrayDataName.pointTo($arrayName, $baseOffset, 
unsafeArraySizeInBytes);
+ | ${ev.value} = $arrayDataName;
+   """.stripMargin
+} else {
+  s"${ev.value} = new ${classOf[GenericArrayData].getName()}(new 
Object[$numElements]);"
+}
+
+val primitiveValueTypeName = CodeGenerator.primitiveTypeName(et)
+val assignments = {
+  val updateArray = if (isPrimitive) {
+s"${ev.value}.set$primitiveValueTypeName(k, $element);"
+  } else {
+s"${ev.value}.update(k, $element);"
+  }
+  s"""
+ | for (int k = 0; k < $count; k++) {
--- End diff --

Fixed in 596a54f


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-03 Thread pepinoflo

Github user pepinoflo commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r185958014
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -798,6 +798,111 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  test("array_repeat function") {
+val strDF = Seq(
+  ("hi", 1),
+  (null, 2)
+).toDF("a", "b")
+
+checkAnswer(
+  strDF.select(array_repeat(strDF("a"), 0)),
+  Seq(
+Row(Seq[String]()),
+Row(Seq[String]())
+  ))
+
+checkAnswer(
+  strDF.select(array_repeat(strDF("a"), 1)),
+  Seq(
+Row(Seq("hi")),
+Row(Seq(null))
+  ))
+
+checkAnswer(
+  strDF.select(array_repeat(strDF("a"), 2)),
+  Seq(
+Row(Seq("hi", "hi")),
+Row(Seq(null, null))
+  ))
+
+checkAnswer(
+  strDF.select(array_repeat(strDF("a"), strDF("b"))),
+  Seq(
+Row(Seq("hi")),
+Row(Seq(null, null))
+  ))
+
+checkAnswer(
+  strDF.selectExpr("array_repeat(a, 2)"),
+  Seq(
+Row(Seq("hi", "hi")),
+Row(Seq(null, null))
+  ))
+
+checkAnswer(
+  strDF.selectExpr("array_repeat(a, b)"),
+  Seq(
+Row(Seq("hi")),
+Row(Seq(null, null))
+  ))
+
+val intDF = Seq(
+  (1, 1),
+  (3, 2)
+).toDF("a", "b")
+
+checkAnswer(
+  intDF.select(array_repeat(intDF("a"), 0)),
+  Seq(
+Row(Seq[Int]()),
+Row(Seq[Int]())
+  ))
+
+checkAnswer(
+  intDF.select(array_repeat(intDF("a"), 1)),
+  Seq(
+Row(Seq(1)),
+Row(Seq(3))
+  ))
+
+checkAnswer(
+  intDF.select(array_repeat(intDF("a"), 2)),
+  Seq(
+Row(Seq(1, 1)),
+Row(Seq(3, 3))
+  ))
+
+checkAnswer(
+  intDF.select(array_repeat(intDF("a"), intDF("b"))),
+  Seq(
+Row(Seq(1)),
+Row(Seq(3, 3))
+  ))
+
+checkAnswer(
+  intDF.selectExpr("array_repeat(a, 2)"),
+  Seq(
+Row(Seq(1, 1)),
+Row(Seq(3, 3))
+  ))
+
+checkAnswer(
+  intDF.selectExpr("array_repeat(a, b)"),
+  Seq(
+Row(Seq(1)),
+Row(Seq(3, 3))
+  ))
+
+val nullDF = Seq(
+  ("hi", null)
+).toDF("a", "b")
+
+intercept[AnalysisException] {
--- End diff --

Thanks, this is actually the test that confused me for null handling.
Fixed in commit 596a54f


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-03 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r185861349
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,98 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val expected = IntegerType
+if (!expected.acceptsType(right.dataType)) {
+  val mismatch = s"argument 2 requires ${expected.simpleString} type, 
" +
+s"however, '${right.sql}' is of ${right.dataType.simpleString} 
type."
+  TypeCheckResult.TypeCheckFailure(mismatch)
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def nullable: Boolean = false
+
+  override def eval(input: InternalRow): Any = {
+new 
GenericArrayData(List.fill(right.eval(input).asInstanceOf[Integer])(left.eval(input)))
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val element = leftGen.value
+val count = rightGen.value
+val et = dataType.elementType
+val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+val arrayDataName = ctx.freshName("arrayData")
+val arrayName = ctx.freshName("arrayObject")
+val initialization = (numElements: String) => if (isPrimitive) {
+  val arrayName = ctx.freshName("array")
+  val baseOffset = Platform.BYTE_ARRAY_OFFSET
+  s"""
+ | int numBytes = ${et.defaultSize} * $numElements;
+ | int unsafeArraySizeInBytes = 
UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+ |   + org.apache.spark.unsafe.array.ByteArrayMethods
+ | .roundNumberOfBytesToNearestWord(numBytes);
+ | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
+ | UnsafeArrayData $arrayDataName = new UnsafeArrayData();
+ | Platform.putLong($arrayName, $baseOffset, $numElements);
+ | $arrayDataName.pointTo($arrayName, $baseOffset, 
unsafeArraySizeInBytes);
+ | ${ev.value} = $arrayDataName;
+   """.stripMargin
+} else {
+  s"${ev.value} = new ${classOf[GenericArrayData].getName()}(new 
Object[$numElements]);"
+}
+
+val primitiveValueTypeName = CodeGenerator.primitiveTypeName(et)
+val assignments = {
+  val updateArray = if (isPrimitive) {
+s"${ev.value}.set$primitiveValueTypeName(k, $element);"
+  } else {
+s"${ev.value}.update(k, $element);"
--- End diff --

What happens if `left` is evaluated as `null`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-03 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r185860581
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,98 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val expected = IntegerType
+if (!expected.acceptsType(right.dataType)) {
+  val mismatch = s"argument 2 requires ${expected.simpleString} type, 
" +
+s"however, '${right.sql}' is of ${right.dataType.simpleString} 
type."
+  TypeCheckResult.TypeCheckFailure(mismatch)
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def nullable: Boolean = false
+
+  override def eval(input: InternalRow): Any = {
+new 
GenericArrayData(List.fill(right.eval(input).asInstanceOf[Integer])(left.eval(input)))
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val element = leftGen.value
+val count = rightGen.value
+val et = dataType.elementType
+val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+val arrayDataName = ctx.freshName("arrayData")
+val arrayName = ctx.freshName("arrayObject")
+val initialization = (numElements: String) => if (isPrimitive) {
+  val arrayName = ctx.freshName("array")
+  val baseOffset = Platform.BYTE_ARRAY_OFFSET
+  s"""
+ | int numBytes = ${et.defaultSize} * $numElements;
+ | int unsafeArraySizeInBytes = 
UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+ |   + org.apache.spark.unsafe.array.ByteArrayMethods
+ | .roundNumberOfBytesToNearestWord(numBytes);
+ | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
--- End diff --

Do we need size check? For example, 0x7000_ long elements do not fit 
into `byte[]`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-03 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r185858002
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,98 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val expected = IntegerType
+if (!expected.acceptsType(right.dataType)) {
+  val mismatch = s"argument 2 requires ${expected.simpleString} type, 
" +
+s"however, '${right.sql}' is of ${right.dataType.simpleString} 
type."
+  TypeCheckResult.TypeCheckFailure(mismatch)
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def nullable: Boolean = false
+
+  override def eval(input: InternalRow): Any = {
+new 
GenericArrayData(List.fill(right.eval(input).asInstanceOf[Integer])(left.eval(input)))
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val element = leftGen.value
+val count = rightGen.value
+val et = dataType.elementType
+val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+val arrayDataName = ctx.freshName("arrayData")
+val arrayName = ctx.freshName("arrayObject")
+val initialization = (numElements: String) => if (isPrimitive) {
+  val arrayName = ctx.freshName("array")
+  val baseOffset = Platform.BYTE_ARRAY_OFFSET
+  s"""
+ | int numBytes = ${et.defaultSize} * $numElements;
+ | int unsafeArraySizeInBytes = 
UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+ |   + org.apache.spark.unsafe.array.ByteArrayMethods
+ | .roundNumberOfBytesToNearestWord(numBytes);
+ | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
+ | UnsafeArrayData $arrayDataName = new UnsafeArrayData();
+ | Platform.putLong($arrayName, $baseOffset, $numElements);
+ | $arrayDataName.pointTo($arrayName, $baseOffset, 
unsafeArraySizeInBytes);
+ | ${ev.value} = $arrayDataName;
+   """.stripMargin
+} else {
+  s"${ev.value} = new ${classOf[GenericArrayData].getName()}(new 
Object[$numElements]);"
+}
+
+val primitiveValueTypeName = CodeGenerator.primitiveTypeName(et)
+val assignments = {
+  val updateArray = if (isPrimitive) {
+s"${ev.value}.set$primitiveValueTypeName(k, $element);"
+  } else {
+s"${ev.value}.update(k, $element);"
+  }
+  s"""
+ | for (int k = 0; k < $count; k++) {
+ |   ${updateArray};
+ | }
+   """.stripMargin
+}
+
+val resultCode = s"""
+| if ($count < 0) {
+|   ${initialization("0")}
+| } else {
+|   ${initialization(count)}
+| }
+| ${assignments}
+  """.stripMargin
+
+ev.copy(code = s"""
--- End diff --

Do you want to use the following style here, too?
```
s"""
   |
   |
 """
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-02 Thread mn-mikke

Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r185538873
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,98 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val expected = IntegerType
+if (!expected.acceptsType(right.dataType)) {
+  val mismatch = s"argument 2 requires ${expected.simpleString} type, 
" +
+s"however, '${right.sql}' is of ${right.dataType.simpleString} 
type."
+  TypeCheckResult.TypeCheckFailure(mismatch)
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def nullable: Boolean = false
+
+  override def eval(input: InternalRow): Any = {
+new 
GenericArrayData(List.fill(right.eval(input).asInstanceOf[Integer])(left.eval(input)))
--- End diff --

```
scala> List.fill(null.asInstanceOf[Integer])("abc")
java.lang.NullPointerException
  at scala.Predef$.Integer2int(Predef.scala:362)
  ... 51 elided
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-02 Thread mn-mikke

Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r185540189
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,98 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val expected = IntegerType
+if (!expected.acceptsType(right.dataType)) {
+  val mismatch = s"argument 2 requires ${expected.simpleString} type, 
" +
+s"however, '${right.sql}' is of ${right.dataType.simpleString} 
type."
+  TypeCheckResult.TypeCheckFailure(mismatch)
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def nullable: Boolean = false
+
+  override def eval(input: InternalRow): Any = {
+new 
GenericArrayData(List.fill(right.eval(input).asInstanceOf[Integer])(left.eval(input)))
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val element = leftGen.value
+val count = rightGen.value
+val et = dataType.elementType
+val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+val arrayDataName = ctx.freshName("arrayData")
+val arrayName = ctx.freshName("arrayObject")
+val initialization = (numElements: String) => if (isPrimitive) {
+  val arrayName = ctx.freshName("array")
+  val baseOffset = Platform.BYTE_ARRAY_OFFSET
+  s"""
+ | int numBytes = ${et.defaultSize} * $numElements;
+ | int unsafeArraySizeInBytes = 
UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+ |   + org.apache.spark.unsafe.array.ByteArrayMethods
+ | .roundNumberOfBytesToNearestWord(numBytes);
+ | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
+ | UnsafeArrayData $arrayDataName = new UnsafeArrayData();
+ | Platform.putLong($arrayName, $baseOffset, $numElements);
+ | $arrayDataName.pointTo($arrayName, $baseOffset, 
unsafeArraySizeInBytes);
+ | ${ev.value} = $arrayDataName;
+   """.stripMargin
+} else {
+  s"${ev.value} = new ${classOf[GenericArrayData].getName()}(new 
Object[$numElements]);"
+}
+
+val primitiveValueTypeName = CodeGenerator.primitiveTypeName(et)
+val assignments = {
+  val updateArray = if (isPrimitive) {
+s"${ev.value}.set$primitiveValueTypeName(k, $element);"
--- End diff --

What about `null` elements?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-02 Thread mn-mikke

Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r185540852
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,98 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val expected = IntegerType
+if (!expected.acceptsType(right.dataType)) {
+  val mismatch = s"argument 2 requires ${expected.simpleString} type, 
" +
+s"however, '${right.sql}' is of ${right.dataType.simpleString} 
type."
+  TypeCheckResult.TypeCheckFailure(mismatch)
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def nullable: Boolean = false
+
+  override def eval(input: InternalRow): Any = {
+new 
GenericArrayData(List.fill(right.eval(input).asInstanceOf[Integer])(left.eval(input)))
+  }
+
+  override def prettyName: String = "array_repeat"
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val element = leftGen.value
+val count = rightGen.value
+val et = dataType.elementType
+val isPrimitive = CodeGenerator.isPrimitiveType(et)
+
+val arrayDataName = ctx.freshName("arrayData")
+val arrayName = ctx.freshName("arrayObject")
+val initialization = (numElements: String) => if (isPrimitive) {
+  val arrayName = ctx.freshName("array")
+  val baseOffset = Platform.BYTE_ARRAY_OFFSET
+  s"""
+ | int numBytes = ${et.defaultSize} * $numElements;
+ | int unsafeArraySizeInBytes = 
UnsafeArrayData.calculateHeaderPortionInBytes($numElements)
+ |   + org.apache.spark.unsafe.array.ByteArrayMethods
+ | .roundNumberOfBytesToNearestWord(numBytes);
+ | byte[] $arrayName = new byte[unsafeArraySizeInBytes];
+ | UnsafeArrayData $arrayDataName = new UnsafeArrayData();
+ | Platform.putLong($arrayName, $baseOffset, $numElements);
+ | $arrayDataName.pointTo($arrayName, $baseOffset, 
unsafeArraySizeInBytes);
+ | ${ev.value} = $arrayDataName;
+   """.stripMargin
+} else {
+  s"${ev.value} = new ${classOf[GenericArrayData].getName()}(new 
Object[$numElements]);"
+}
+
+val primitiveValueTypeName = CodeGenerator.primitiveTypeName(et)
+val assignments = {
+  val updateArray = if (isPrimitive) {
+s"${ev.value}.set$primitiveValueTypeName(k, $element);"
+  } else {
+s"${ev.value}.update(k, $element);"
+  }
+  s"""
+ | for (int k = 0; k < $count; k++) {
--- End diff --

What about `count` is `null`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-02 Thread mn-mikke

Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r185534457
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,98 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression {
+
+  override def dataType: ArrayType = ArrayType(left.dataType, 
left.nullable)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val expected = IntegerType
+if (!expected.acceptsType(right.dataType)) {
+  val mismatch = s"argument 2 requires ${expected.simpleString} type, 
" +
+s"however, '${right.sql}' is of ${right.dataType.simpleString} 
type."
+  TypeCheckResult.TypeCheckFailure(mismatch)
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def nullable: Boolean = false
--- End diff --

What about cases when `right` is evaluated to `null`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-02 Thread mn-mikke

Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r185544657
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -798,6 +798,111 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  test("array_repeat function") {
+val strDF = Seq(
+  ("hi", 1),
+  (null, 2)
+).toDF("a", "b")
+
+checkAnswer(
+  strDF.select(array_repeat(strDF("a"), 0)),
+  Seq(
+Row(Seq[String]()),
+Row(Seq[String]())
+  ))
+
+checkAnswer(
+  strDF.select(array_repeat(strDF("a"), 1)),
+  Seq(
+Row(Seq("hi")),
+Row(Seq(null))
+  ))
+
+checkAnswer(
+  strDF.select(array_repeat(strDF("a"), 2)),
+  Seq(
+Row(Seq("hi", "hi")),
+Row(Seq(null, null))
+  ))
+
+checkAnswer(
+  strDF.select(array_repeat(strDF("a"), strDF("b"))),
+  Seq(
+Row(Seq("hi")),
+Row(Seq(null, null))
+  ))
+
+checkAnswer(
+  strDF.selectExpr("array_repeat(a, 2)"),
+  Seq(
+Row(Seq("hi", "hi")),
+Row(Seq(null, null))
+  ))
+
+checkAnswer(
+  strDF.selectExpr("array_repeat(a, b)"),
+  Seq(
+Row(Seq("hi")),
+Row(Seq(null, null))
+  ))
+
+val intDF = Seq(
+  (1, 1),
+  (3, 2)
+).toDF("a", "b")
+
+checkAnswer(
+  intDF.select(array_repeat(intDF("a"), 0)),
+  Seq(
+Row(Seq[Int]()),
+Row(Seq[Int]())
+  ))
+
+checkAnswer(
+  intDF.select(array_repeat(intDF("a"), 1)),
+  Seq(
+Row(Seq(1)),
+Row(Seq(3))
+  ))
+
+checkAnswer(
+  intDF.select(array_repeat(intDF("a"), 2)),
+  Seq(
+Row(Seq(1, 1)),
+Row(Seq(3, 3))
+  ))
+
+checkAnswer(
+  intDF.select(array_repeat(intDF("a"), intDF("b"))),
+  Seq(
+Row(Seq(1)),
+Row(Seq(3, 3))
+  ))
+
+checkAnswer(
+  intDF.selectExpr("array_repeat(a, 2)"),
+  Seq(
+Row(Seq(1, 1)),
+Row(Seq(3, 3))
+  ))
+
+checkAnswer(
+  intDF.selectExpr("array_repeat(a, b)"),
+  Seq(
+Row(Seq(1)),
+Row(Seq(3, 3))
+  ))
+
+val nullDF = Seq(
+  ("hi", null)
+).toDF("a", "b")
+
+intercept[AnalysisException] {
--- End diff --

```
scala> val nullDF = Seq(
 | ("hi", null)
 | ).toDF("a", "b")
nullDF: org.apache.spark.sql.DataFrame = [a: string, b: null]

scala> nullDF.printSchema
root
 |-- a: string (nullable = true)
 |-- b: null (nullable = true)
```

Please can you also add a test case when `b` is `int (nullable=true)`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-02 Thread mn-mikke

Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/21208#discussion_r185532437
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1229,3 +1229,98 @@ case class Flatten(child: Expression) extends 
UnaryExpression {
 
   override def prettyName: String = "flatten"
 }
+
+/**
+ * Returns the array containing the given input value (left) count (right) 
times.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(element, count) - Returns the array containing element 
count times.",
+  examples = """
+Examples:
+  > SELECT _FUNC_('123', 2);
+   ['123', '123']
+  """)
+case class ArrayRepeat(left: Expression, right: Expression)
+  extends BinaryExpression {
--- End diff --

What about simplifying the type checking with the below snippet? 
```
with ExpectsInputTypes {
 override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType, 
IntegerType)
...
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21208: [SPARK-23925][SQL] Add array_repeat collection fu...

2018-05-01 Thread pepinoflo

GitHub user pepinoflo opened a pull request:

https://github.com/apache/spark/pull/21208

[SPARK-23925][SQL] Add array_repeat collection function

## What changes were proposed in this pull request?

The PR adds a new collection function, array_repeat. As there already was a 
function repeat with the same signature, with the only difference being the 
expected return type (String instead of Array), the new function is called 
array_repeat to distinguish.
The behaviour of the function is based on Presto's one.

The function creates an array containing a given element repeated the 
requested number of times.

## How was this patch tested?

New unit tests added into:
- CollectionExpressionsSuite
- DataFrameFunctionsSuite


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pepinoflo/spark SPARK-23925

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21208.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21208


commit f5a23cf8321ddca315c92fdf7366974469ce6395
Author: Florent PÃ©pin 
Date:   2018-05-01T18:07:34Z

[SPARK-23925][SQL] Add array_repeat function

commit ce2a9ddd1abf91f3c1abe1bf4b437a063dccfecb
Author: Florent PÃ©pin 
Date:   2018-05-01T20:00:34Z

Merge master into branch

commit 88d84252eb87e9d16b0e274db6db007133999e78
Author: Florent PÃ©pin 
Date:   2018-05-01T21:12:38Z

Fix false string to FalseLiteral




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

49 matches

Mail list logo