ajithme edited a comment on issue #24142: [SPARK-27194][core] Job failures when 
task attempts do not clean up spark-staging parquet files
URL: https://github.com/apache/spark/pull/24142#issuecomment-474866759
 
 
   So as we can see from stacktrace, when we have 
``spark.sql.sources.partitionOverwriteMode=DYNAMIC`` and we have overwrite 
enabled @ ``InsertIntoHadoopFsRelationCommand``, 
   ```
       val enableDynamicOverwrite =
         sparkSession.sessionState.conf.partitionOverwriteMode == 
PartitionOverwriteMode.DYNAMIC
       // This config only makes sense when we are overwriting a partitioned 
dataset with dynamic
       // partition columns.
       val dynamicPartitionOverwrite = enableDynamicOverwrite && mode == 
SaveMode.Overwrite &&
         staticPartitions.size < partitionColumns.length
   ```
   we directly try to write to final location via ``DynamicPartitionWriteTask`` 
which fails as it tries to create the file because of left over file from 
previous task. i.e because 
``org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter#recordWriter``
 always calls 
``org.apache.parquet.hadoop.ParquetOutputFormat#getRecordWriter(Configuration, 
Path, org.apache.parquet.hadoop.metadata.CompressionCodecName)`` which always 
has ``org.apache.parquet.hadoop.ParquetFileWriter`` with 
``org.apache.parquet.hadoop.ParquetFileWriter.Mode#CREATE``
   
   So for the retry task, ``getRecordWriter`` will never succeed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to