[GitHub] spark pull request #21713: [SPARK-24737][SQL] Type coercion between StructTy...

2018-07-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21713


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21713: [SPARK-24737][SQL] Type coercion between StructTy...

2018-07-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21713#discussion_r200324941
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -166,6 +166,30 @@ object TypeCoercion {
 case (l, r) => None
   }
 
+  private def mergeComplexTypes(
+  t1: DataType,
+  t2: DataType,
--- End diff --

yea it's wrong after this patch


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21713: [SPARK-24737][SQL] Type coercion between StructTy...

2018-07-05 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21713#discussion_r200319604
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -166,6 +166,30 @@ object TypeCoercion {
 case (l, r) => None
   }
 
+  private def mergeComplexTypes(
+  t1: DataType,
+  t2: DataType,
--- End diff --

The test result might be wrong?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21713: [SPARK-24737][SQL] Type coercion between StructTy...

2018-07-05 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21713#discussion_r200315553
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -166,6 +166,30 @@ object TypeCoercion {
 case (l, r) => None
   }
 
+  private def mergeComplexTypes(
+  t1: DataType,
+  t2: DataType,
--- End diff --

Seems like the behavior will be changed if we reuse there, and the test 
[TypeCoercionSuite.scala#L397-L400](https://github.com/apache/spark/blob/bed6849dbf51e1981772cd353ce1a7ae4f0626e2/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala#L397-L400)
 fails.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21713: [SPARK-24737][SQL] Type coercion between StructTy...

2018-07-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21713#discussion_r200281741
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -166,6 +166,30 @@ object TypeCoercion {
 case (l, r) => None
   }
 
+  private def mergeComplexTypes(
+  t1: DataType,
+  t2: DataType,
--- End diff --

Maybe `findTightestCommonType` can also reuse `mergeComplexTypes` ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21713: [SPARK-24737][SQL] Type coercion between StructTy...

2018-07-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21713#discussion_r200277231
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -166,6 +166,30 @@ object TypeCoercion {
 case (l, r) => None
   }
 
+  private def mergeComplexTypes(
+  t1: DataType,
+  t2: DataType,
+  mergeFunc: (DataType, DataType) => Option[DataType]): 
Option[DataType] = (t1, t2) match {
+case (ArrayType(et1, containsNull1), ArrayType(et2, containsNull2)) =>
+  mergeFunc(et1, et2).map(ArrayType(_, containsNull1 || containsNull2))
+case (MapType(kt1, vt1, valueContainsNull1), MapType(kt2, vt2, 
valueContainsNull2)) =>
+  mergeFunc(kt1, kt2).flatMap { kt =>
+mergeFunc(vt1, vt2).map { vt =>
+  MapType(kt, vt, valueContainsNull1 || valueContainsNull2)
+}
+  }
+case (StructType(fields1), StructType(fields2)) if fields1.length == 
fields2.length =>
+  val resolver = SQLConf.get.resolver
+  fields1.zip(fields2).foldLeft(Option(new StructType())) {
+case (Some(struct), (field1, field2)) if resolver(field1.name, 
field2.name) =>
--- End diff --

Ah, I see. `findTightestCommonType` only picks first field name when 
caseSensitive = false. This does the same thing actually.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21713: [SPARK-24737][SQL] Type coercion between StructTy...

2018-07-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21713#discussion_r200275396
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -166,6 +166,30 @@ object TypeCoercion {
 case (l, r) => None
   }
 
+  private def mergeComplexTypes(
+  t1: DataType,
+  t2: DataType,
+  mergeFunc: (DataType, DataType) => Option[DataType]): 
Option[DataType] = (t1, t2) match {
+case (ArrayType(et1, containsNull1), ArrayType(et2, containsNull2)) =>
+  mergeFunc(et1, et2).map(ArrayType(_, containsNull1 || containsNull2))
+case (MapType(kt1, vt1, valueContainsNull1), MapType(kt2, vt2, 
valueContainsNull2)) =>
+  mergeFunc(kt1, kt2).flatMap { kt =>
+mergeFunc(vt1, vt2).map { vt =>
+  MapType(kt, vt, valueContainsNull1 || valueContainsNull2)
+}
+  }
+case (StructType(fields1), StructType(fields2)) if fields1.length == 
fields2.length =>
+  val resolver = SQLConf.get.resolver
+  fields1.zip(fields2).foldLeft(Option(new StructType())) {
+case (Some(struct), (field1, field2)) if resolver(field1.name, 
field2.name) =>
--- End diff --

In case the names between two fields are different, 
`findTightestCommonType` uses the name of first field. This seems stricter?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21713: [SPARK-24737][SQL] Type coercion between StructTy...

2018-07-05 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21713#discussion_r200251401
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -166,6 +166,30 @@ object TypeCoercion {
 case (l, r) => None
   }
 
+  private def mergeComplexTypes(
--- End diff --

nit: I would use similar naming conversion such as `findWiderTypeForComplex`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21713: [SPARK-24737][SQL] Type coercion between StructTy...

2018-07-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21713#discussion_r200109841
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -185,6 +185,15 @@ object TypeCoercion {
   MapType(kt, vt, valueContainsNull1 || valueContainsNull2)
 }
   }
+case (StructType(fields1), StructType(fields2)) if fields1.length 
== fields2.length =>
--- End diff --

is it time to reduce code duplication now?
```
def mergeComplexTypes(
t1: DataType,
t2: DataType,
mergeFunc: (DataType, DataType) => Option[DataType]): Option[DataType] 
= (t1, t2) match {
  case (ArrayType(et1, containsNull1), ArrayType(et2, containsNull2)) =>
mergeFunc(et1, et2).map(ArrayType(_, containsNull1 || containsNull2))
  ...
}

```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21713: [SPARK-24737][SQL] Type coercion between StructTy...

2018-07-04 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/21713

[SPARK-24737][SQL] Type coercion between StructTypes.

## What changes were proposed in this pull request?

We can support type coercion between `StructType`s where all the internal 
types are compatible.

## How was this patch tested?

Added tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark 
issues/SPARK-24737/structtypecoercion

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21713.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21713


commit 5b9b9013826d126b8d7ce986515f395d147acb91
Author: Takuya UESHIN 
Date:   2018-07-04T08:25:50Z

Type coercion between StructTypes.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org