[GitHub] [spark] ulysses-you commented on a diff in pull request #38939: [SPARK-41407][SQL] Pull out v1 write to WriteFiles

GitBox Wed, 07 Dec 2022 22:47:55 -0800


ulysses-you commented on code in PR #38939:
URL: https://github.com/apache/spark/pull/38939#discussion_r1042983373



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala:
##########
@@ -785,7 +785,7 @@ private[sql] object QueryExecutionErrors extends 
QueryErrorsBase {
   def taskFailedWhileWritingRowsError(cause: Throwable): Throwable = {
     new SparkException(
       errorClass = "_LEGACY_ERROR_TEMP_2054",
-      messageParameters = Map.empty,
+      messageParameters = Map("message" -> cause.getMessage),

Review Comment:
   @cloud-fan  after we introduce Writefiles, the error stack for writing 
changes.
   
   before:
   ```
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 15.0 failed 1 times, most recent failure: Lost task 0.0 in stage 15.0 
(TID 15) (10.221.97.76 executor driver): java.lang.RuntimeException: Exceeds 
char/varchar type length limitation: 5
        at 
org.apache.spark.sql.catalyst.util.CharVarcharCodegenUtils.trimTrailingSpaces(CharVarcharCodegenUtils.java:30)
        at 
org.apache.spark.sql.catalyst.util.CharVarcharCodegenUtils.charTypeWriteSideCheck(CharVarcharCodegenUtils.java:43)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
   ```
   
   after:
   ```
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 15.0 failed 1 times, most recent failure: Lost task 0.0 in stage 15.0 
(TID 15) (10.221.97.76 executor driver): org.apache.spark.SparkException: Task 
failed while writing rows.
        at 
org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:789)
        at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:416)
        at 
org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:89)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
        at org.apache.spark.scheduler.Task.run(Task.scala:139)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1502)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
   Caused by: java.lang.RuntimeException: Exceeds char/varchar type length 
limitation: 5
        at 
org.apache.spark.sql.catalyst.util.CharVarcharCodegenUtils.trimTrailingSpaces(CharVarcharCodegenUtils.java:30)
        at 
org.apache.spark.sql.catalyst.util.CharVarcharCodegenUtils.charTypeWriteSideCheck(CharVarcharCodegenUtils.java:43)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
   ```
   
   So I added the root cause mesagge into the wrapped `SparkException`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ulysses-you commented on a diff in pull request #38939: [SPARK-41407][SQL] Pull out v1 write to WriteFiles

Reply via email to