subject:"\[GitHub\] spark pull request #13976\: \[SPARK\-16288\]\[SQL\] Implement inline table generat..."

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-07-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13976


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69395123
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,38 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n  
[1,a]\n  [2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function $prettyName should be array of struct type, 
not ${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) => et
+  }
+
+  private lazy val numFields = elementSchema.fields.length
+
+  override def eval(input: InternalRow): TraversableOnce[InternalRow] = {
+val inputArray = child.eval(input).asInstanceOf[ArrayData]
+if (inputArray == null) {
+  Nil
+} else {
+  for (i <- 0 until inputArray.numElements())
+yield inputArray.getStruct(i, numFields)
--- End diff --

ah i see, `for-yield` returns an iterator.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-07-03 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69394891
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,38 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n  
[1,a]\n  [2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function $prettyName should be array of struct type, 
not ${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) => et
+  }
+
+  private lazy val numFields = elementSchema.fields.length
+
+  override def eval(input: InternalRow): TraversableOnce[InternalRow] = {
+val inputArray = child.eval(input).asInstanceOf[ArrayData]
+if (inputArray == null) {
+  Nil
+} else {
+  for (i <- 0 until inputArray.numElements())
+yield inputArray.getStruct(i, numFields)
--- End diff --

Thank you, @cloud-fan . By the way, for about this, @rxin gave me an advice 
at the first commit of this PR.
> we don't need to materialize the array, do we? We can create an iterator 
to return the results.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69388270
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,38 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n  
[1,a]\n  [2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function $prettyName should be array of struct type, 
not ${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) => et
+  }
+
+  private lazy val numFields = elementSchema.fields.length
+
+  override def eval(input: InternalRow): TraversableOnce[InternalRow] = {
+val inputArray = child.eval(input).asInstanceOf[ArrayData]
+if (inputArray == null) {
+  Nil
+} else {
+  for (i <- 0 until inputArray.numElements())
+yield inputArray.getStruct(i, numFields)
--- End diff --

I'm not sure how is the performance of `for-yield`, maybe it's safe to 
create an array manually and use while loop here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-07-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69372303
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala ---
@@ -89,4 +89,32 @@ class GeneratorFunctionSuite extends QueryTest with 
SharedSQLContext {
   exploded.join(exploded, exploded("i") === 
exploded("i")).agg(count("*")),
   Row(3) :: Nil)
   }
+
+  test("inline raises exception on empty array") {
--- End diff --

Yep. That's more clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-07-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69372281
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,42 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  StructType(et.fields.zipWithIndex.map {
--- End diff --

Oh, my god. I was too naive, here.
Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-07-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69371427
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala ---
@@ -89,4 +89,32 @@ class GeneratorFunctionSuite extends QueryTest with 
SharedSQLContext {
   exploded.join(exploded, exploded("i") === 
exploded("i")).agg(count("*")),
   Row(3) :: Nil)
   }
+
+  test("inline raises exception on empty array") {
--- End diff --

`on array of null type`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-07-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69371397
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,42 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  StructType(et.fields.zipWithIndex.map {
--- End diff --

why not return `et` directly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-07-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69321672
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,42 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  StructType(et.fields.zipWithIndex.map {
+case (field, index) => StructField(field.name, field.dataType, 
nullable = field.nullable)
+  })
+  }
+
+  private lazy val ncol = elementSchema.fields.length
+
+  override def eval(input: InternalRow): TraversableOnce[InternalRow] = 
child.dataType match {
--- End diff --

Oh, really it's useless. It was for NullType before. I'll remove this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-07-01 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69321184
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,42 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  StructType(et.fields.zipWithIndex.map {
--- End diff --

Yep. Currently, our type checker ensures that homogeneous StructType array. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-07-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69301324
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala ---
@@ -89,4 +91,30 @@ class GeneratorFunctionSuite extends QueryTest with 
SharedSQLContext {
   exploded.join(exploded, exploded("i") === 
exploded("i")).agg(count("*")),
   Row(3) :: Nil)
   }
+
+  test("inline with empty table or empty array") {
--- End diff --

the test name is misleading: we do allow empty array, the problem is 
`array()` returns an array of null, which fails the type check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-07-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69301056
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratorExpressionSuite.scala
 ---
@@ -68,4 +69,23 @@ class GeneratorExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper {
   PosExplode(CreateArray(str_array.map(Literal(_,
   str_correct_answer.map(InternalRow.fromSeq(_)))
   }
+
+  test("inline") {
+val correct_answer = Seq(
+  Seq(0, UTF8String.fromString("a")),
--- End diff --

we can create a row directly in test: call `create_row(...)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-07-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69301097
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratorExpressionSuite.scala
 ---
@@ -68,4 +69,23 @@ class GeneratorExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper {
   PosExplode(CreateArray(str_array.map(Literal(_,
   str_correct_answer.map(InternalRow.fromSeq(_)))
   }
+
+  test("inline") {
+val correct_answer = Seq(
+  Seq(0, UTF8String.fromString("a")),
--- End diff --

and it can help us convert string to UTF8String


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-07-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69300935
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratorExpressionSuite.scala
 ---
@@ -68,4 +69,23 @@ class GeneratorExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper {
   PosExplode(CreateArray(str_array.map(Literal(_,
   str_correct_answer.map(InternalRow.fromSeq(_)))
   }
+
+  test("inline") {
+val correct_answer = Seq(
+  Seq(0, UTF8String.fromString("a")),
+  Seq(1, UTF8String.fromString("b")),
+  Seq(2, UTF8String.fromString("c")))
+
+checkTuple(
+  Inline(Literal.create(Array(), 
ArrayType(StructType(Seq(StructField("id1", LongType)),
--- End diff --

we usually use `new StructType().add("id", LongType)` to create struct type


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-07-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69300771
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,42 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  StructType(et.fields.zipWithIndex.map {
+case (field, index) => StructField(field.name, field.dataType, 
nullable = field.nullable)
+  })
+  }
+
+  private lazy val ncol = elementSchema.fields.length
--- End diff --

I'd like to name it `numFields`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-07-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69300727
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,42 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  StructType(et.fields.zipWithIndex.map {
+case (field, index) => StructField(field.name, field.dataType, 
nullable = field.nullable)
+  })
+  }
+
+  private lazy val ncol = elementSchema.fields.length
+
+  override def eval(input: InternalRow): TraversableOnce[InternalRow] = 
child.dataType match {
--- End diff --

Why do we pattern match here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-07-01 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69300459
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,42 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  StructType(et.fields.zipWithIndex.map {
--- End diff --

hmm, so it's just `et` now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69233560
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratorExpressionSuite.scala
 ---
@@ -68,4 +68,23 @@ class GeneratorExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper {
   PosExplode(CreateArray(str_array.map(Literal(_,
   str_correct_answer.map(InternalRow.fromSeq(_)))
   }
+
+  test("inline") {
+val correct_answer = Seq(
+  Seq(0, UTF8String.fromString("a")),
+  Seq(1, UTF8String.fromString("b")),
+  Seq(2, UTF8String.fromString("c")))
+
+checkTuple(
+  Inline(CreateArray(Seq.empty)),
--- End diff --

we can use `Literal.create(Array(), ArrayType(...))` to create a empty 
array with expected type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69233198
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,43 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  var schema = new StructType()
+  for (f <- et.fields.zipWithIndex) {
+schema = schema.add(s"col${f._2 + 1}", f._1.dataType, nullable = 
f._1.nullable)
+  }
+  schema
+  }
+
+  override def eval(input: InternalRow): TraversableOnce[InternalRow] = 
child.dataType match {
+case ArrayType(NullType, _) => Nil
--- End diff --

We still need the expression-level unit tests...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69232964
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,43 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  var schema = new StructType()
+  for (f <- et.fields.zipWithIndex) {
+schema = schema.add(s"col${f._2 + 1}", f._1.dataType, nullable = 
f._1.nullable)
+  }
+  schema
+  }
+
+  override def eval(input: InternalRow): TraversableOnce[InternalRow] = 
child.dataType match {
+case ArrayType(NullType, _) => Nil
--- End diff --

Indeed. I'll move the testcase into end-to-end testsuite.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69232649
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,43 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  var schema = new StructType()
+  for (f <- et.fields.zipWithIndex) {
+schema = schema.add(s"col${f._2 + 1}", f._1.dataType, nullable = 
f._1.nullable)
+  }
+  schema
+  }
+
+  override def eval(input: InternalRow): TraversableOnce[InternalRow] = 
child.dataType match {
+case ArrayType(NullType, _) => Nil
--- End diff --

the expression evaluate test won't go through the type check process


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69232584
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,43 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  var schema = new StructType()
+  for (f <- et.fields.zipWithIndex) {
+schema = schema.add(s"col${f._2 + 1}", f._1.dataType, nullable = 
f._1.nullable)
+  }
+  schema
+  }
+
+  override def eval(input: InternalRow): TraversableOnce[InternalRow] = 
child.dataType match {
+case ArrayType(NullType, _) => Nil
+case ArrayType(et : StructType, _) =>
+  val inputArray = child.eval(input).asInstanceOf[ArrayData]
+  if (inputArray == null) {
+Nil
+  } else {
+for (i <- 0 until inputArray.numElements())
+  yield 
InternalRow(inputArray.array(i).asInstanceOf[GenericInternalRow].values: _*)
--- End diff --

yup


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69231096
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala ---
@@ -89,4 +89,11 @@ class GeneratorFunctionSuite extends QueryTest with 
SharedSQLContext {
   exploded.join(exploded, exploded("i") === 
exploded("i")).agg(count("*")),
   Row(3) :: Nil)
   }
+
+  test("single inline") {
+val df = Seq((1, Seq(1, 2, 3))).toDF("a", "intList")
+checkAnswer(
+  df.selectExpr("inline(array(struct(10, 100), struct(20, 200), 
struct(30, 300)))"),
--- End diff --

Yep. I'll add that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69230634
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,43 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  var schema = new StructType()
+  for (f <- et.fields.zipWithIndex) {
+schema = schema.add(s"col${f._2 + 1}", f._1.dataType, nullable = 
f._1.nullable)
+  }
+  schema
+  }
+
+  override def eval(input: InternalRow): TraversableOnce[InternalRow] = 
child.dataType match {
+case ArrayType(NullType, _) => Nil
+case ArrayType(et : StructType, _) =>
+  val inputArray = child.eval(input).asInstanceOf[ArrayData]
+  if (inputArray == null) {
+Nil
+  } else {
+for (i <- 0 until inputArray.numElements())
+  yield 
InternalRow(inputArray.array(i).asInstanceOf[GenericInternalRow].values: _*)
--- End diff --

Do you mean `getStruct(int ordinal, int numFields)` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69230198
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,43 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  var schema = new StructType()
+  for (f <- et.fields.zipWithIndex) {
+schema = schema.add(s"col${f._2 + 1}", f._1.dataType, nullable = 
f._1.nullable)
+  }
+  schema
+  }
+
+  override def eval(input: InternalRow): TraversableOnce[InternalRow] = 
child.dataType match {
+case ArrayType(NullType, _) => Nil
--- End diff --

I'm also wondering how this pass the type checking.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69230152
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,43 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  var schema = new StructType()
+  for (f <- et.fields.zipWithIndex) {
+schema = schema.add(s"col${f._2 + 1}", f._1.dataType, nullable = 
f._1.nullable)
+  }
+  schema
+  }
+
+  override def eval(input: InternalRow): TraversableOnce[InternalRow] = 
child.dataType match {
+case ArrayType(NullType, _) => Nil
--- End diff --

Actually, I added this to pass the `Seq.empty` testcase.

https://github.com/apache/spark/pull/13976/files#diff-6715134a4e95980149a7600ecb71674cR80

But, now, I think I added this without deep thought.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69229632
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,43 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  var schema = new StructType()
+  for (f <- et.fields.zipWithIndex) {
+schema = schema.add(s"col${f._2 + 1}", f._1.dataType, nullable = 
f._1.nullable)
--- End diff --

Oh, I overridded the `named_struct` case.
Sorry, I'll fix this. Actually, `f._1.name` was consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69229314
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,43 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  var schema = new StructType()
+  for (f <- et.fields.zipWithIndex) {
+schema = schema.add(s"col${f._2 + 1}", f._1.dataType, nullable = 
f._1.nullable)
+  }
+  schema
--- End diff --

Sure. Thanks, @cloud-fan !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69229355
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala ---
@@ -89,4 +89,11 @@ class GeneratorFunctionSuite extends QueryTest with 
SharedSQLContext {
   exploded.join(exploded, exploded("i") === 
exploded("i")).agg(count("*")),
   Row(3) :: Nil)
   }
+
+  test("single inline") {
+val df = Seq((1, Seq(1, 2, 3))).toDF("a", "intList")
+checkAnswer(
+  df.selectExpr("inline(array(struct(10, 100), struct(20, 200), 
struct(30, 300)))"),
--- End diff --

we also need a test having a FROM clause


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69229227
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,43 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  var schema = new StructType()
+  for (f <- et.fields.zipWithIndex) {
+schema = schema.add(s"col${f._2 + 1}", f._1.dataType, nullable = 
f._1.nullable)
+  }
+  schema
+  }
+
+  override def eval(input: InternalRow): TraversableOnce[InternalRow] = 
child.dataType match {
+case ArrayType(NullType, _) => Nil
+case ArrayType(et : StructType, _) =>
+  val inputArray = child.eval(input).asInstanceOf[ArrayData]
+  if (inputArray == null) {
+Nil
+  } else {
+for (i <- 0 until inputArray.numElements())
+  yield 
InternalRow(inputArray.array(i).asInstanceOf[GenericInternalRow].values: _*)
--- End diff --

isn't it just `inputArray.getStruct(i)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69229017
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,43 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  var schema = new StructType()
+  for (f <- et.fields.zipWithIndex) {
+schema = schema.add(s"col${f._2 + 1}", f._1.dataType, nullable = 
f._1.nullable)
+  }
+  schema
--- End diff --

actually that's the best !



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69229018
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,43 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  var schema = new StructType()
+  for (f <- et.fields.zipWithIndex) {
+schema = schema.add(s"col${f._2 + 1}", f._1.dataType, nullable = 
f._1.nullable)
+  }
+  schema
+  }
+
+  override def eval(input: InternalRow): TraversableOnce[InternalRow] = 
child.dataType match {
+case ArrayType(NullType, _) => Nil
--- End diff --

how can this pass the type check?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69228969
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,43 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  var schema = new StructType()
+  for (f <- et.fields.zipWithIndex) {
+schema = schema.add(s"col${f._2 + 1}", f._1.dataType, nullable = 
f._1.nullable)
--- End diff --

is it hive's rule to use `col1`, `col2` etc. instead of the field name of 
the struct?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69228897
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -195,3 +195,43 @@ case class Explode(child: Expression) extends 
ExplodeBase(child, position = fals
   extended = "> SELECT _FUNC_(array(10,20));\n  0\t10\n  1\t20")
 // scalastyle:on line.size.limit
 case class PosExplode(child: Expression) extends ExplodeBase(child, 
position = true)
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.",
+  extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n 
[1,a]\n[2,b]")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  var schema = new StructType()
+  for (f <- et.fields.zipWithIndex) {
+schema = schema.add(s"col${f._2 + 1}", f._1.dataType, nullable = 
f._1.nullable)
+  }
+  schema
--- End diff --

how about
```
StructType(et.fields.zipWithIndex.map {
  case (field, index) => StructField(s"col{$index + 1}", field.dataType, 
nullable = field.nullable)
})
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69117412
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -149,3 +149,42 @@ case class Explode(child: Expression) extends 
UnaryExpression with Generator wit
 }
   }
 }
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  et.fields.zipWithIndex.foldLeft(new StructType()) { case (schema, 
(f, i)) =>
+schema.add(s"col${i + 1}", f.dataType, nullable = f.nullable)
+  }
+  }
+
+  override def eval(input: InternalRow): TraversableOnce[InternalRow] = 
child.dataType match {
+case ArrayType(et : StructType, _) =>
+  val inputArray = child.eval(input).asInstanceOf[ArrayData]
+  if (inputArray == null) {
+Nil
+  } else {
+val rows = new Array[InternalRow](inputArray.numElements())
+inputArray.foreach(et, (i, e) => {
--- End diff --

Oh, sure. I'll update this, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69114498
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -149,3 +149,42 @@ case class Explode(child: Expression) extends 
UnaryExpression with Generator wit
 }
   }
 }
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  et.fields.zipWithIndex.foldLeft(new StructType()) { case (schema, 
(f, i)) =>
--- End diff --

Oh, I see. I'll fix this and remember that. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69109263
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -149,3 +149,42 @@ case class Explode(child: Expression) extends 
UnaryExpression with Generator wit
 }
   }
 }
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.")
--- End diff --

Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69084526
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -149,3 +149,42 @@ case class Explode(child: Expression) extends 
UnaryExpression with Generator wit
 }
   }
 }
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  et.fields.zipWithIndex.foldLeft(new StructType()) { case (schema, 
(f, i)) =>
+schema.add(s"col${i + 1}", f.dataType, nullable = f.nullable)
+  }
+  }
+
+  override def eval(input: InternalRow): TraversableOnce[InternalRow] = 
child.dataType match {
+case ArrayType(et : StructType, _) =>
+  val inputArray = child.eval(input).asInstanceOf[ArrayData]
+  if (inputArray == null) {
+Nil
+  } else {
+val rows = new Array[InternalRow](inputArray.numElements())
+inputArray.foreach(et, (i, e) => {
--- End diff --

we don't need to materialize the array, do we? We can create an iterator to 
return the results.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69083901
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -149,3 +149,42 @@ case class Explode(child: Expression) extends 
UnaryExpression with Generator wit
 }
   }
 }
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.")
+case class Inline(child: Expression) extends UnaryExpression with 
Generator with CodegenFallback {
+
+  override def children: Seq[Expression] = child :: Nil
+
+  override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
+case ArrayType(et, _) if et.isInstanceOf[StructType] =>
+  TypeCheckResult.TypeCheckSuccess
+case _ =>
+  TypeCheckResult.TypeCheckFailure(
+s"input to function inline should be array of struct type, not 
${child.dataType}")
+  }
+
+  override def elementSchema: StructType = child.dataType match {
+case ArrayType(et : StructType, _) =>
+  et.fields.zipWithIndex.foldLeft(new StructType()) { case (schema, 
(f, i)) =>
--- End diff --

everytime I see a foldLeft I feel it would be more clear if it's written in 
a more imperative style ... :)

```
var schema = new StructType()
for (i <- et.fields.indices) {
  schema = schema.add(...)
}
schema
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-30 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13976#discussion_r69083645
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -149,3 +149,42 @@ case class Explode(child: Expression) extends 
UnaryExpression with Generator wit
 }
   }
 }
+
+/**
+ * Explodes an array of structs into a table.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a) - Explodes an array of structs into a table.")
--- End diff --

give an example in extended?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...

2016-06-29 Thread dongjoon-hyun

GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/13976

[SPARK-16288][SQL] Implement inline table generating function

## What changes were proposed in this pull request?

This PR implements `inline` table generating function.

## How was this patch tested?

Pass the Jenkins tests with new testcase.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-16288

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13976.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13976


commit 1add31a1472ada2ac20e676873a4dc88d9c8393f
Author: Dongjoon Hyun 
Date:   2016-06-29T17:37:01Z

[SPARK-16288][SQL] Implement inline table generating function




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

41 matches

Mail list logo