[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2020-01-05 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r363111332
 
 

 ##
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/types/TestUDT.scala
 ##
 @@ -58,4 +63,22 @@ object TestUDT {
 
 override def equals(other: Any): Boolean = 
other.isInstanceOf[MyDenseVectorUDT]
   }
+
+  private[sql] class MyXMLGregorianCalendarUDT extends 
UserDefinedType[XMLGregorianCalendar] {
+override def sqlType: DataType = TimestampType
+
+override def serialize(obj: XMLGregorianCalendar): Any =
+  obj.toGregorianCalendar.getTimeInMillis * 1000
+
+override def deserialize(datum: Any): XMLGregorianCalendar = {
+  val calendar = new GregorianCalendar
+  calendar.setTimeInMillis(datum.asInstanceOf[Long])
+  DatatypeFactory.newInstance.newXMLGregorianCalendar(calendar)
+}
+
+override def userClass: Class[XMLGregorianCalendar] = 
classOf[XMLGregorianCalendar]
+
+// By setting this to a timestamp, we lose the information about the udt
+override private[sql] def jsonValue: JValue = "timestamp"
 
 Review comment:
   That is correct. So when we serialize it, it will use:
   ```scala
   override def serialize(obj: XMLGregorianCalendar): Any =
 obj.toGregorianCalendar.getTimeInMillis * 1000
   ```
   Which will write it as a timestamp stored as the number of microseconds from 
the epoch of 1970-01-01T00:00:00.00Z (UTC+00:00).
   
   And this serialize implementation should be the same as the one in the 
`sqlType`, because we don't store any references to the UDT, another Spark 
session will just decode it as a `TimestampType`, so the decoder of the 
TimestampType will be used. 
   
   I know that this isn't trivial, but it is very powerful and makes the UDT's 
much more flexible when writing ETL jobs since you can directly use your own 
types with the Dataset API.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2020-01-05 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r363110958
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
 ##
 @@ -592,6 +597,12 @@ object StructType extends AbstractDataType {
   case (leftUdt: UserDefinedType[_], rightUdt: UserDefinedType[_])
 if leftUdt.userClass == rightUdt.userClass => leftUdt
 
+  case (leftType, rightUdt: UserDefinedType[_])
+if leftType == rightUdt.sqlType => leftType
+
+  case (leftUdt: UserDefinedType[_], rightType)
+if leftUdt.sqlType == rightType => rightType
 
 Review comment:
   Well, the `UserDefinedType` extends `DataType`, similar to `TimestampType`, 
`StringType`, and any other type. The thing is that a UserDefinedType can be 
compatible with any other type. For example, it is allowed to merge an int into 
a long. This is an explicit choice by the developer.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2019-12-02 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r353026018
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala
 ##
 @@ -287,4 +293,63 @@ class UserDefinedTypeSuite extends QueryTest with 
SharedSparkSession with Parque
 checkAnswer(spark.createDataFrame(data, schema).selectExpr("typeof(a)"),
   Seq(Row("array")))
   }
+
+  test("Allow merge UserDefinedType into a native DataType") {
 
 Review comment:
   After a good night of sleep, I've came up with an integration test, that 
will mimic the behavior as in Delta.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2019-12-02 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r352771008
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala
 ##
 @@ -287,4 +293,63 @@ class UserDefinedTypeSuite extends QueryTest with 
SharedSparkSession with Parque
 checkAnswer(spark.createDataFrame(data, schema).selectExpr("typeof(a)"),
   Seq(Row("array")))
   }
+
+  test("Allow merge UserDefinedType into a native DataType") {
 
 Review comment:
   @maropu I'm unable to reproduce the issue without Delta. Since plan Spark 
does not check schema compatibility on write, I'm unable to implement an end to 
end integration test. I tried to fix this by taking the union of the two DF's, 
but this took another branch in the codebase, and it failed on `TypeCoercion`.
   
   Please advise. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2019-11-27 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r351535108
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala
 ##
 @@ -287,4 +293,63 @@ class UserDefinedTypeSuite extends QueryTest with 
SharedSparkSession with Parque
 checkAnswer(spark.createDataFrame(data, schema).selectExpr("typeof(a)"),
   Seq(Row("array")))
   }
+
+  test("Allow merge UserDefinedType into a native DataType") {
 
 Review comment:
   Ok, I've rewritten the test, and not it fails on current master:
   ```
   scala> val df = gregorianCalendarDf.union(timestampDf)
   org.apache.spark.sql.AnalysisException: Union can only be performed on 
tables with the compatible column types. timestamp <> timestamp at the first 
column of the second table;;
   'Union
   :- LogicalRDD [dt#11], false
   +- LogicalRDD [dt#14], false
   
 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:43)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:95)
 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$12$$anonfun$apply$13.apply(CheckAnalysis.scala:294)
 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$12$$anonfun$apply$13.apply(CheckAnalysis.scala:291)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$12.apply(CheckAnalysis.scala:291)
 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$12.apply(CheckAnalysis.scala:280)
 at scala.collection.immutable.List.foreach(List.scala:392)
 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:280)
 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:86)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:86)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:95)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:108)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:105)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
 at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
 at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
 at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
 at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:78)
 at org.apache.spark.sql.Dataset.withSetOperator(Dataset.scala:3424)
 at org.apache.spark.sql.Dataset.union(Dataset.scala:1862)
 ... 40 elided
   ```
   I'm creating two DF's, both representing time, and one of them a UDT. I'm 
trying to merge them. This is very similar to what I'm doing with Delta. With 
Delta, using `SaveMode.Overwrite` it will still check the compatibility with 
the existing schema.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2019-11-27 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r351430247
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala
 ##
 @@ -287,4 +293,63 @@ class UserDefinedTypeSuite extends QueryTest with 
SharedSparkSession with Parque
 checkAnswer(spark.createDataFrame(data, schema).selectExpr("typeof(a)"),
   Seq(Row("array")))
   }
+
+  test("Allow merge UserDefinedType into a native DataType") {
 
 Review comment:
   Interesting. When appending to the table, Spark does not check compatibility 
on write, when using Delta this is the case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2019-11-27 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r351293879
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala
 ##
 @@ -287,4 +293,63 @@ class UserDefinedTypeSuite extends QueryTest with 
SharedSparkSession with Parque
 checkAnswer(spark.createDataFrame(data, schema).selectExpr("typeof(a)"),
   Seq(Row("array")))
   }
+
+  test("Allow merge UserDefinedType into a native DataType") {
 
 Review comment:
   You're right, let me investigate.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2019-11-27 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r351154890
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala
 ##
 @@ -287,4 +293,63 @@ class UserDefinedTypeSuite extends QueryTest with 
SharedSparkSession with Parque
 checkAnswer(spark.createDataFrame(data, schema).selectExpr("typeof(a)"),
   Seq(Row("array")))
   }
+
+  test("Allow merge UserDefinedType into a native DataType") {
 
 Review comment:
   Ok, let's remove the rule, and I'll retrigger the test


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2019-11-27 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r351154695
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
 ##
 @@ -453,7 +453,23 @@ case class StructType(fields: Array[StructField]) extends 
DataType with Seq[Stru
* 3. If B doesn't exist in `this`, it's also included in the result schema.
* 4. Otherwise, `this` and `that` are considered as conflicting schemas and 
an exception would be
*thrown.
+   *
+   * Function to merge the two DataTypes that is compatible with the left
+   * and right side. The order is:
+   *
+   * 1. Merge Arrays, where the type of the Arrays should be compatible
+   * 2. Merge Maps, where the type of the Maps should be compatible
+   * 3. Merge Structs, where the struct recursively checked for compatibility
+   * 4. Merge DecimalType, where the scale and precision should be equal
+   * 5. Merge UserDefinedType, where the underlying class is equal
+   * 6. Merge UserDefinedType into DataType, where the underlying DataType is 
equal
+   * 7. Merge DataType, where we check if the DataTypes are compatible
+   * 8. Unable to determine compatibility or incompatible, throw exception
+   *
+   * @throws org.apache.spark.SparkException In case the DataTypes are 
incompatible
+   * @return The compatible DataType that support both left and right
*/
+  @throws(classOf[SparkException])
 
 Review comment:
   Sure, I've removed it for now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2019-11-26 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r350698612
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
 ##
 @@ -453,7 +453,23 @@ case class StructType(fields: Array[StructField]) extends 
DataType with Seq[Stru
* 3. If B doesn't exist in `this`, it's also included in the result schema.
* 4. Otherwise, `this` and `that` are considered as conflicting schemas and 
an exception would be
*thrown.
+   *
 
 Review comment:
   Okay, I've added a statement to the existing comment


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2019-11-26 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r350674098
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
 ##
 @@ -453,7 +453,23 @@ case class StructType(fields: Array[StructField]) extends 
DataType with Seq[Stru
* 3. If B doesn't exist in `this`, it's also included in the result schema.
* 4. Otherwise, `this` and `that` are considered as conflicting schemas and 
an exception would be
*thrown.
+   *
+   * Function to merge the two DataTypes that is compatible with the left
+   * and right side. The order is:
+   *
+   * 1. Merge Arrays, where the type of the Arrays should be compatible
+   * 2. Merge Maps, where the type of the Maps should be compatible
+   * 3. Merge Structs, where the struct recursively checked for compatibility
+   * 4. Merge DecimalType, where the scale and precision should be equal
+   * 5. Merge UserDefinedType, where the underlying class is equal
+   * 6. Merge UserDefinedType into DataType, where the underlying DataType is 
equal
+   * 7. Merge DataType, where we check if the DataTypes are compatible
+   * 8. Unable to determine compatibility or incompatible, throw exception
+   *
+   * @throws org.apache.spark.SparkException In case the DataTypes are 
incompatible
+   * @return The compatible DataType that support both left and right
*/
+  @throws(classOf[SparkException])
 
 Review comment:
   If the schema isn't compatible, it will throw a SparkException. I like these 
functions to be annotated. If you don't like it, I'm happy to remove it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2019-11-26 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r350673743
 
 

 ##
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/types/TestUDT.scala
 ##
 @@ -58,4 +63,22 @@ object TestUDT {
 
 override def equals(other: Any): Boolean = 
other.isInstanceOf[MyDenseVectorUDT]
   }
+
+  private[sql] class MyXMLGregorianCalendarUDT extends 
UserDefinedType[XMLGregorianCalendar] {
+override def sqlType: DataType = TimestampType
+
+override def serialize(obj: XMLGregorianCalendar): Any =
+  obj.toGregorianCalendar.getTimeInMillis * 1000
+
+override def deserialize(datum: Any): XMLGregorianCalendar = {
+  val calendar = new GregorianCalendar
+  calendar.setTimeInMillis(datum.asInstanceOf[Long])
+  DatatypeFactory.newInstance.newXMLGregorianCalendar(calendar)
+}
+
+override def userClass: Class[XMLGregorianCalendar] = 
classOf[XMLGregorianCalendar]
+
+// By setting this to a timestamp, we lose the information about the udt
+override private[sql] def jsonValue: JValue = "timestamp"
 
 Review comment:
   With the normal UDT `jsonValue`, we would get:
   ```
 override private[sql] def jsonValue: JValue = {
   ("type" -> "udt") ~
 ("class" -> this.getClass.getName) ~
 ("pyClass" -> pyUDT) ~
 ("sqlType" -> sqlType.jsonValue)
   ```
   Which will write the type as the UDT. If you try to read the column later on 
in another job where the UDTRegistration hasn't been done will give an error. 
Therefore we would like to write the XMLGregorianCalendar as a normal 
timestamp. However, when we append the table, we want to be able to merge the 
`XMLGregorianCalendar` UDT into the timestamp. Without the added rule this 
wouldn't be possible. Hope this helps.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2019-11-26 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r350672550
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala
 ##
 @@ -287,4 +293,63 @@ class UserDefinedTypeSuite extends QueryTest with 
SharedSparkSession with Parque
 checkAnswer(spark.createDataFrame(data, schema).selectExpr("typeof(a)"),
   Seq(Row("array")))
   }
+
+  test("Allow merge UserDefinedType into a native DataType") {
 
 Review comment:
   Perfect, much cleaner. Thanks for suggesting.
   
   I'm pretty sure that it will fail without the fix. It will fail when it will 
write the second time. Since we've saved the `XMLGregorianCalendarImpl` as a 
timestamp (this is why we've overwritten the `jvalue`), Spark will recognize it 
as a regular timestamp. At then next append we're trying to merge an 
XMLGregorianCalendar into a timestamp, which isn't allowed with the additional 
rule. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2019-11-25 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r350270807
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala
 ##
 @@ -478,4 +479,33 @@ class DataTypeSuite extends SparkFunSuite {
 
 assert(result === expected)
   }
+
+  test("Allow UserDefinedType to be merged into a native DateType") {
+// In this testcase we have stored the XMLGregorianCalendar as a Timestamp
+// And we want to read this later on as a regular TimestampType
+val left = StructType(
+  StructField("a", TimestampType) :: Nil)
+
+// We should be able to merge these types since the
+// CustomXMLGregorianCalendarType maps to a TimestampTypes
+val right = StructType(
+  StructField("a", new MyXMLGregorianCalendarUDT) :: Nil)
+
+// We expect the DataType to be returned instead of the UserDefinedType
+assert(left.merge(right) === left)
+  }
+
+  test("Disallow DataType to be merged into UserDefinedType") {
+val right = StructType(
+  StructField("a", TimestampType) :: Nil)
+
+val left = StructType(
+  StructField("a", new MyXMLGregorianCalendarUDT) :: Nil)
+
+// We don't allow DataTypes being mapped into a UserDefinedType.
+// Main reason is we don't see any application in this for now.
+intercept[SparkException] {
+  left.merge(right) === left
+}
 
 Review comment:
   Sure


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2019-11-25 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r350213716
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala
 ##
 @@ -478,4 +479,33 @@ class DataTypeSuite extends SparkFunSuite {
 
 assert(result === expected)
   }
+
+  test("Allow UserDefinedType to be merged into a native DateType") {
+// In this testcase we have stored the XMLGregorianCalendar as a Timestamp
+// And we want to read this later on as a regular TimestampType
+val left = StructType(
+  StructField("a", TimestampType) :: Nil)
+
+// We should be able to merge these types since the
+// CustomXMLGregorianCalendarType maps to a TimestampTypes
+val right = StructType(
+  StructField("a", new MyXMLGregorianCalendarUDT) :: Nil)
+
+// We expect the DataType to be returned instead of the UserDefinedType
+assert(left.merge(right) === left)
+  }
+
+  test("Disallow DataType to be merged into UserDefinedType") {
+val right = StructType(
+  StructField("a", TimestampType) :: Nil)
+
+val left = StructType(
+  StructField("a", new MyXMLGregorianCalendarUDT) :: Nil)
 
 Review comment:
   Sure


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2019-11-25 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r350213677
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala
 ##
 @@ -478,4 +479,33 @@ class DataTypeSuite extends SparkFunSuite {
 
 assert(result === expected)
   }
+
+  test("Allow UserDefinedType to be merged into a native DateType") {
+// In this testcase we have stored the XMLGregorianCalendar as a Timestamp
+// And we want to read this later on as a regular TimestampType
+val left = StructType(
+  StructField("a", TimestampType) :: Nil)
+
+// We should be able to merge these types since the
+// CustomXMLGregorianCalendarType maps to a TimestampTypes
+val right = StructType(
+  StructField("a", new MyXMLGregorianCalendarUDT) :: Nil)
+
+// We expect the DataType to be returned instead of the UserDefinedType
+assert(left.merge(right) === left)
+  }
+
+  test("Disallow DataType to be merged into UserDefinedType") {
+val right = StructType(
+  StructField("a", TimestampType) :: Nil)
 
 Review comment:
   Sure


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2019-11-25 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r350213508
 
 

 ##
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/types/TestUDT.scala
 ##
 @@ -58,4 +63,22 @@ object TestUDT {
 
 override def equals(other: Any): Boolean = 
other.isInstanceOf[MyDenseVectorUDT]
   }
+
+  private[sql] class MyXMLGregorianCalendarUDT extends 
UserDefinedType[XMLGregorianCalendar] {
 
 Review comment:
   No, this isn't possible since we need to override the `jvalue` to the native 
DataType type. Otherwise, it will be caught by: 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L592


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2019-11-25 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r350212404
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
 ##
 @@ -526,6 +526,25 @@ object StructType extends AbstractDataType {
   case _ => dt
 }
 
+  /**
+   * Function to merge the two DataTypes that is compatible with the left
+   * and right side. The order is:
+   *
+   * 1. Merge Arrays, where the type of the Arrays should be compatible
+   * 2. Merge Maps, where the type of the Maps should be compatible
+   * 3. Merge Structs, where the struct recursively checked for compatibility
+   * 4. Merge DecimalType, where the scale and precision should be equal
+   * 5. Merge UserDefinedType, where the underlying class is equal
+   * 6. Merge UserDefinedType into DataType, where the underlying DataType is 
equal
+   * 7. Merge DataType, where we check if the DataTypes are compatible
+   * 8. Unable to determine compatibility or incompatible, throw exception
 
 Review comment:
   Sure


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2019-11-24 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r349941019
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala
 ##
 @@ -17,15 +17,22 @@
 
 package org.apache.spark.sql
 
-import java.util.Arrays
+import java.nio.file.Files
+import java.sql.Timestamp
+import java.util.{Arrays, GregorianCalendar}
 
+import 
com.sun.org.apache.xerces.internal.jaxp.datatype.XMLGregorianCalendarImpl
+import javax.xml.datatype.XMLGregorianCalendar
 
 Review comment:
   Thank you for the pointers. I've fixed the style errors.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2019-11-24 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r349918766
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala
 ##
 @@ -478,4 +483,33 @@ class DataTypeSuite extends SparkFunSuite {
 
 assert(result === expected)
   }
+
+  test("SPARK-30004: Allow UserDefinedType to be merged into a standard 
DateType") {
+class CustomXMLGregorianCalendarType extends 
UserDefinedType[XMLGregorianCalendar] {
+  override def sqlType: DataType = TimestampType
+
+  override def serialize(obj: XMLGregorianCalendar): Any =
+obj.toGregorianCalendar.getTimeInMillis * 1000
+
+  override def deserialize(datum: Any): XMLGregorianCalendar = {
+val calendar = new GregorianCalendar
+calendar.setTimeInMillis(datum.asInstanceOf[Long])
+DatatypeFactory.newInstance.newXMLGregorianCalendar(calendar)
+  }
+
+  override def userClass: Class[XMLGregorianCalendar] = 
classOf[XMLGregorianCalendar]
+
+  override private[sql] def jsonValue: JValue = "timestamp"
+}
+
+val left = StructType(
+  StructField("a", TimestampType) :: Nil)
+
+// We should be able to merge these types since the
+// CustomXMLGregorianCalendarType maps to a TimestampTypes
+val right = StructType(
+  StructField("a", new CustomXMLGregorianCalendarType) :: Nil)
+
+assert(left.merge(right) === left)
 
 Review comment:
   The current implementation isn't symmetrical. So we can convert a 
UserDefinedType to a DateType, but not the other way around. I'm hesitant to 
add this functionality because I don't see any obvious applications. Please let 
me know if you think this should be added as well. I've added a test to check 
the opposite case as well, including some additional comments to clarify the 
idea and working.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2019-11-24 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r349908548
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala
 ##
 @@ -478,4 +483,33 @@ class DataTypeSuite extends SparkFunSuite {
 
 assert(result === expected)
   }
+
+  test("SPARK-30004: Allow UserDefinedType to be merged into a standard 
DateType") {
 
 Review comment:
   Thank you, I wasn't aware of this convention.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType

2019-11-24 Thread GitBox
Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r349908533
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
 ##
 @@ -592,6 +592,9 @@ object StructType extends AbstractDataType {
   case (leftUdt: UserDefinedType[_], rightUdt: UserDefinedType[_])
 if leftUdt.userClass == rightUdt.userClass => leftUdt
 
+  case (leftType, rightUdt: UserDefinedType[_])
 
 Review comment:
   I've added a Scaladoc to the private merge function. I think the Javadoc 
that you're pointing to, is describing the function at a different level. For 
example, it doesn't mention any UDT's at all. Let me know what you think.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org