[GitHub] [spark] MaxGekk opened a new pull request #31884: [SPARK-34793][SQL] Prohibit saving of day-time and year-month intervals

GitBox Thu, 18 Mar 2021 13:24:10 -0700


MaxGekk opened a new pull request #31884:
URL: https://github.com/apache/spark/pull/31884



   ### What changes were proposed in this pull request?
   For all built-in datasources, prohibit saving of year-month and day-time 
intervals that were introduced by SPARK-27793. We plan to support saving of 
such types at the milestone 2, see SPARK-27790. 
   
   ### Why are the changes needed?
   To improve user experience with Spark SQL, and print nicer error message. 
Current error message might confuse users:
   ```
   scala> 
Seq(java.time.Period.ofMonths(1)).toDF.write.mode("overwrite").json("/Users/maximgekk/tmp/123")
   21/03/18 22:44:35 ERROR FileFormatWriter: Aborting job 
8de402d7-ab69-4dc0-aa8e-14ef06bd2d6b.
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 
1) (192.168.1.66 executor driver): org.apache.spark.SparkException: Task failed 
while writing rows.
        at 
org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:418)
        at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:298)
        at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:211)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1437)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.RuntimeException: Failed to convert value 1 (class of 
class java.lang.Integer}) with the type of YearMonthIntervalType to JSON.
        at scala.sys.package$.error(package.scala:30)
        at 
org.apache.spark.sql.catalyst.json.JacksonGenerator.$anonfun$makeWriter$23(JacksonGenerator.scala:179)
        at 
org.apache.spark.sql.catalyst.json.JacksonGenerator.$anonfun$makeWriter$23$adapted(JacksonGenerator.scala:176)
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. After the changes, the example above:
   ```
   scala> 
Seq(java.time.Period.ofMonths(1)).toDF.write.mode("overwrite").json("/Users/maximgekk/tmp/123")
   org.apache.spark.sql.AnalysisException: Cannot save the 'year-month 
interval' data type into external storage.
   ```
   
   ### How was this patch tested?
   Manually by saving year-month and day-time intervals to the JSON datasource.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] MaxGekk opened a new pull request #31884: [SPARK-34793][SQL] Prohibit saving of day-time and year-month intervals

Reply via email to