[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18460 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r144120938 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala --- @@ -131,14 +131,17 @@ class TypeCoercionSuite extends AnalysisTest { widenFunc: (DataType, DataType) => Option[DataType], t1: DataType, t2: DataType, - expected: Option[DataType]): Unit = { + expected: Option[DataType], + isSymmetric: Boolean = true): Unit = { --- End diff -- @gatorsmile . I extended this function for using non-symmetric tests and addressed your comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r144082145 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,6 +101,17 @@ object TypeCoercion { case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) => Some(TimestampType) +case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if t1.sameType(t2) => + Some(StructType(fields1.zip(fields2).map { case (f1, f2) => +// Since `t1.sameType(t2)` is true, two StructTypes have the same DataType +// except `name` (in case of `spark.sql.caseSensitive=false`) and `nullable`. +// - Different names: use a lower case name because findTightestCommonType is commutative. +// - Different nullabilities: `nullable` is true iff one of them is nullable. +val name = if (f1.name == f2.name) f1.name else f1.name.toLowerCase(Locale.ROOT) +val dataType = findTightestCommonType(f1.dataType, f2.dataType).get +StructField(name, dataType, nullable = f1.nullable || f2.nullable) --- End diff -- Please see [TypeCoercionSuite.checkWidenType](https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala#L130-L142). In order to use the first type name, we need to loosen this test helper function and to break the existing commutative assumption. I'm ok for that if you want. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r144078834 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,6 +101,17 @@ object TypeCoercion { case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) => Some(TimestampType) +case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if t1.sameType(t2) => + Some(StructType(fields1.zip(fields2).map { case (f1, f2) => +// Since `t1.sameType(t2)` is true, two StructTypes have the same DataType +// except `name` (in case of `spark.sql.caseSensitive=false`) and `nullable`. +// - Different names: use a lower case name because findTightestCommonType is commutative. +// - Different nullabilities: `nullable` is true iff one of them is nullable. +val name = if (f1.name == f2.name) f1.name else f1.name.toLowerCase(Locale.ROOT) +val dataType = findTightestCommonType(f1.dataType, f2.dataType).get +StructField(name, dataType, nullable = f1.nullable || f2.nullable) --- End diff -- Sure, right. It's for commutativity. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r144075681 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,6 +101,17 @@ object TypeCoercion { case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) => Some(TimestampType) +case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if t1.sameType(t2) => + Some(StructType(fields1.zip(fields2).map { case (f1, f2) => +// Since `t1.sameType(t2)` is true, two StructTypes have the same DataType +// except `name` (in case of `spark.sql.caseSensitive=false`) and `nullable`. +// - Different names: use a lower case name because findTightestCommonType is commutative. +// - Different nullabilities: `nullable` is true iff one of them is nullable. +val name = if (f1.name == f2.name) f1.name else f1.name.toLowerCase(Locale.ROOT) +val dataType = findTightestCommonType(f1.dataType, f2.dataType).get +StructField(name, dataType, nullable = f1.nullable || f2.nullable) --- End diff -- ``` val name = if (f1.name == f2.name) f1.name else f1.name.toLowerCase(Locale.ROOT) ``` The above code changes the case, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r143908510 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,6 +101,17 @@ object TypeCoercion { case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) => Some(TimestampType) +case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if t1.sameType(t2) => + Some(StructType(fields1.zip(fields2).map { case (f1, f2) => +// Since `t1.sameType(t2)` is true, two StructTypes have the same DataType +// except `name` (in case of `spark.sql.caseSensitive=false`) and `nullable`. +// - Different names: use a lower case name because findTightestCommonType is commutative. +// - Different nullabilities: `nullable` is true iff one of them is nullable. +val name = if (f1.name == f2.name) f1.name else f1.name.toLowerCase(Locale.ROOT) +val dataType = findTightestCommonType(f1.dataType, f2.dataType).get +StructField(name, dataType, nullable = f1.nullable || f2.nullable) --- End diff -- This PR works as you want. This function is used to compare the equality only. BTW, for this function, it should should lower (or upper case) because it should be commutative. ``` scala> sql("SELECT struct(1 a) UNION ALL (SELECT struct(2 A))").printSchema root |-- named_struct(a, 1 AS `a`): struct (nullable = false) ||-- a: integer (nullable = false) scala> sql("SELECT struct(1 A) UNION ALL (SELECT struct(2 a))").printSchema root |-- named_struct(A, 1 AS `A`): struct (nullable = false) ||-- A: integer (nullable = false) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r143906721 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,6 +101,17 @@ object TypeCoercion { case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) => Some(TimestampType) +case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if t1.sameType(t2) => + Some(StructType(fields1.zip(fields2).map { case (f1, f2) => +// Since `t1.sameType(t2)` is true, two StructTypes have the same DataType +// except `name` (in case of `spark.sql.caseSensitive=false`) and `nullable`. +// - Different names: use a lower case name because findTightestCommonType is commutative. +// - Different nullabilities: `nullable` is true iff one of them is nullable. +val name = if (f1.name == f2.name) f1.name else f1.name.toLowerCase(Locale.ROOT) +val dataType = findTightestCommonType(f1.dataType, f2.dataType).get +StructField(name, dataType, nullable = f1.nullable || f2.nullable) --- End diff -- See the example, ```Scala sql("SELECT 1 as a UNION ALL (SELECT 1 as A)").show() sql("SELECT 1 as A UNION ALL (SELECT 1 as a)").show() ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r143906288 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,6 +101,17 @@ object TypeCoercion { case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) => Some(TimestampType) +case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if t1.sameType(t2) => + Some(StructType(fields1.zip(fields2).map { case (f1, f2) => +// Since `t1.sameType(t2)` is true, two StructTypes have the same DataType +// except `name` (in case of `spark.sql.caseSensitive=false`) and `nullable`. +// - Different names: use a lower case name because findTightestCommonType is commutative. +// - Different nullabilities: `nullable` is true iff one of them is nullable. +val name = if (f1.name == f2.name) f1.name else f1.name.toLowerCase(Locale.ROOT) +val dataType = findTightestCommonType(f1.dataType, f2.dataType).get +StructField(name, dataType, nullable = f1.nullable || f2.nullable) --- End diff -- Should we follow what we are doing for `union`/`except`/`intersect`? Always pick the name of the head one? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r143906011 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,6 +101,17 @@ object TypeCoercion { case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) => Some(TimestampType) +case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if t1.sameType(t2) => + Some(StructType(fields1.zip(fields2).map { case (f1, f2) => +// Since `t1.sameType(t2)` is true, two StructTypes have the same DataType +// except `name` (in case of `spark.sql.caseSensitive=false`) and `nullable`. +// - Different names: use a lower case name because findTightestCommonType is commutative. +// - Different nullabilities: `nullable` is true iff one of them is nullable. +val name = if (f1.name == f2.name) f1.name else f1.name.toLowerCase(Locale.ROOT) +val dataType = findTightestCommonType(f1.dataType, f2.dataType).get --- End diff -- This does not handle the complex type well, right? Multi-layer nested types --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r143295197 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,6 +101,17 @@ object TypeCoercion { case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) => Some(TimestampType) +case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if t1.sameType(t2) => + Some(StructType(fields1.zip(fields2).map { case (f1, f2) => +// Since `t1.sameType(t2)` is true, two StructTypes have the same DataType +// except `name` (in case of `spark.sql.caseSensitive=false`) and `nullable`. +// - Different names: use a lower case name because findTightestCommonType is commutative. +// - Different nullabilities: `nullable` is true iff one of them is nullable. +val name = if (f1.name == f2.name) f1.name else f1.name.toLowerCase(Locale.ROOT) --- End diff -- Yes. It works. I updated the [test cases](https://github.com/apache/spark/pull/18460/files#diff-5a2e7f03d14856c8769fd3ddea8742bdR2657). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r143289255 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,6 +101,17 @@ object TypeCoercion { case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) => Some(TimestampType) +case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if t1.sameType(t2) => + Some(StructType(fields1.zip(fields2).map { case (f1, f2) => +// Since `t1.sameType(t2)` is true, two StructTypes have the same DataType +// except `name` (in case of `spark.sql.caseSensitive=false`) and `nullable`. +// - Different names: use a lower case name because findTightestCommonType is commutative. +// - Different nullabilities: `nullable` is true iff one of them is nullable. +val name = if (f1.name == f2.name) f1.name else f1.name.toLowerCase(Locale.ROOT) --- End diff -- Oh, I see. I thought so. Let me check that again. > I mean, does the case sensitivity conf works in nest column name resolution? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r143288998 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,6 +101,17 @@ object TypeCoercion { case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) => Some(TimestampType) +case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if t1.sameType(t2) => + Some(StructType(fields1.zip(fields2).map { case (f1, f2) => +// Since `t1.sameType(t2)` is true, two StructTypes have the same DataType +// except `name` (in case of `spark.sql.caseSensitive=false`) and `nullable`. +// - Different names: use a lower case name because findTightestCommonType is commutative. +// - Different nullabilities: `nullable` is true iff one of them is nullable. +val name = if (f1.name == f2.name) f1.name else f1.name.toLowerCase(Locale.ROOT) --- End diff -- For Hive, your example works like the following. ``` hive> CREATE TABLE S1 AS SELECT named_struct('A',1) Col1; hive> SELECT S1.Col1.a, S1.Col1.A FROM S1; OK 1 1 ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r143288861 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,6 +101,17 @@ object TypeCoercion { case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) => Some(TimestampType) +case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if t1.sameType(t2) => + Some(StructType(fields1.zip(fields2).map { case (f1, f2) => +// Since `t1.sameType(t2)` is true, two StructTypes have the same DataType +// except `name` (in case of `spark.sql.caseSensitive=false`) and `nullable`. +// - Different names: use a lower case name because findTightestCommonType is commutative. +// - Different nullabilities: `nullable` is true iff one of them is nullable. +val name = if (f1.name == f2.name) f1.name else f1.name.toLowerCase(Locale.ROOT) --- End diff -- I mean, does the case sensitivity conf works in nest column name resolution? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r143288219 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,6 +101,17 @@ object TypeCoercion { case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) => Some(TimestampType) +case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if t1.sameType(t2) => + Some(StructType(fields1.zip(fields2).map { case (f1, f2) => +// Since `t1.sameType(t2)` is true, two StructTypes have the same DataType +// except `name` (in case of `spark.sql.caseSensitive=false`) and `nullable`. +// - Different names: use a lower case name because findTightestCommonType is commutative. +// - Different nullabilities: `nullable` is true iff one of them is nullable. +val name = if (f1.name == f2.name) f1.name else f1.name.toLowerCase(Locale.ROOT) --- End diff -- When case sensitive is off, Spark considers them in lower cases. For example, in [the test case](https://github.com/apache/spark/pull/18460/files#diff-5a2e7f03d14856c8769fd3ddea8742bdR2657), we need table name `struct1` and `struct2`. In case of `a = A`, it raises ambiquous column exceptions. ``` checkAnswer(sql("SELECT * FROM struct1, struct2 WHERE struct1.a = struct2.A"), Seq.empty) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r143285500 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,6 +101,17 @@ object TypeCoercion { case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) => Some(TimestampType) +case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if t1.sameType(t2) => + Some(StructType(fields1.zip(fields2).map { case (f1, f2) => +// Since `t1.sameType(t2)` is true, two StructTypes have the same DataType +// except `name` (in case of `spark.sql.caseSensitive=false`) and `nullable`. +// - Different names: use a lower case name because findTightestCommonType is commutative. +// - Different nullabilities: `nullable` is true iff one of them is nullable. +val name = if (f1.name == f2.name) f1.name else f1.name.toLowerCase(Locale.ROOT) --- End diff -- Do both `tab.Col1.a` and `tab.Col1.A`work well when case sensitivity is off? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r143273532 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,6 +101,17 @@ object TypeCoercion { case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) => Some(TimestampType) +case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if t1.sameType(t2) => + Some(StructType(fields1.zip(fields2).map { case (f1, f2) => +// Since `t1.sameType(t2)` is true, two StructTypes have the same DataType +// except `name` (in case of `spark.sql.caseSensitive=false`) and `nullable`. +// - Different names: use a lower case name because findTightestCommonType is commutative. +// - Different nullabilities: `nullable` is true iff one of them is nullable. +val name = if (f1.name == f2.name) f1.name else f1.name.toLowerCase(Locale.ROOT) --- End diff -- For a test case, does `nested column in the references` mean `WHERE` clause? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r143273000 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,6 +101,17 @@ object TypeCoercion { case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) => Some(TimestampType) +case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if t1.sameType(t2) => + Some(StructType(fields1.zip(fields2).map { case (f1, f2) => +// Since `t1.sameType(t2)` is true, two StructTypes have the same DataType +// except `name` (in case of `spark.sql.caseSensitive=false`) and `nullable`. +// - Different names: use a lower case name because findTightestCommonType is commutative. +// - Different nullabilities: `nullable` is true iff one of them is nullable. +val name = if (f1.name == f2.name) f1.name else f1.name.toLowerCase(Locale.ROOT) --- End diff -- Yes. Hive does like the following. ```sql hive> CREATE TABLE S AS SELECT named_struct('A',1); hive> DESCRIBE S; OK _c0 struct ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org