[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-19 Thread GitBox


turboFei edited a comment on issue #26339:
URL: https://github.com/apache/spark/pull/26339#issuecomment-616269121


   Thanks, have added an UT for renaming failed during renaming dynamic staging 
files to final files.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-19 Thread GitBox
turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix 
the issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-616061133
 
 
   I just tried  to add test for rename failed and task aborted cases.
   
   It seems that, if rename failed and an exception thrown or task aborted when 
renaming dynamicStagingFiles.
   The job would abort.
   So, I think it is not necessary to add tests for these cases.
   cc @manuzhang  @Ngone51 
   
https://github.com/apache/spark/blob/74aed8cc8b94cb459ff12a6e8f1f15cb7aea8c40/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L198-L227


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-19 Thread GitBox
turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix 
the issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-616061133
 
 
   I just tried  to add test for rename failed and task aborted cases.
   
   It seems that, if an exception thrown or task aborted when renaming 
dynamicStagingFiles.
   The job would abort.
   So, I think it is not necessary to add tests for these cases.
   cc @manuzhang  @Ngone51 
   
https://github.com/apache/spark/blob/74aed8cc8b94cb459ff12a6e8f1f15cb7aea8c40/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L198-L227


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-19 Thread GitBox
turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix 
the issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-616061133
 
 
   I just tried  to add test for rename failed and task aborted cases.
   
   It seems that, if an exception thrown or task aborted when renaming 
dynamicStagingFiles.
   The job would abort.
   Is it still necessary to add tests for these cases?
   cc @manuzhang  @Ngone51 
   
https://github.com/apache/spark/blob/74aed8cc8b94cb459ff12a6e8f1f15cb7aea8c40/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L198-L227


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-19 Thread GitBox
turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix 
the issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-616061133
 
 
   I just tried  to add test for rename failed and task aborted cases.
   
   It seems that, if an exception thrown or task aborted when renaming 
dynamicStagingFiles.
   The job would abort.
   So, Is it still necessary to add tests for these cases?
   cc @manuzhang  @Ngone51 
   
https://github.com/apache/spark/blob/74aed8cc8b94cb459ff12a6e8f1f15cb7aea8c40/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L198-L227


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-19 Thread GitBox
turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix 
the issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-616061133
 
 
   I just tried  to add a test for rename failed case.
   
   It seems that, if an exception thrown or task aborted when renaming 
dynamicStagingFiles.
   The job would abort.
   cc @manuzhang  @Ngone51 
   
https://github.com/apache/spark/blob/74aed8cc8b94cb459ff12a6e8f1f15cb7aea8c40/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L198-L227


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-19 Thread GitBox
turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix 
the issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-616061133
 
 
   I just tried  to add a test for rename failed case.
   
   It seems that, if an exception thrown or task aborted when renaming 
dynamicStagingFiles.
   The job would abort.
   
https://github.com/apache/spark/blob/74aed8cc8b94cb459ff12a6e8f1f15cb7aea8c40/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L198-L227


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-18 Thread GitBox
turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix 
the issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-613445024
 
 
   Revert to origin code.
   It seems that let all tasks, which have same stageId and taskId, to rename 
their dynamic staging task files is the best plan.
   Otherwise, it is too complex, and may cause data loss.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-14 Thread GitBox
turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix 
the issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-613245483
 
 
   I will fix the docs and add UTs later.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-13 Thread GitBox
turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix 
the issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-613226194
 
 
   > IIUC, only one task attempt should commit at a time. If so, will there 
still be conflicts with task speculation when the first task commit 
successfully ?
   
   Got, It seems that we need judge whether the commit operation performed in 
SparkHadoopMapRedUtil.commitTask.
   Then we need revert the performed commit status if exception happened when 
renaming dynacmicStagingTaskFiles.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-12 Thread GitBox
turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix 
the issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-612717480
 
 
   Hi @Ngone51 
   I have tried to add an ut for HadoopMapReduceCommitProtocol.
   Could you help take a look? Thanks in advance~


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-07 Thread GitBox
turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix 
the issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-610699108
 
 
   > If the issue is caused by conflict file name, why it is only specific to 
`dynamicPartitionOverwrite`?
   
   Spark use Hadoop's FileOutputCommitter as OutputCommitter class normally.
   There is an working directory, for spark, it is `_temporary/0'.
   Each task has an unique output, like `_temporary/0/taskAtttempt_0**/'.
   Each task also has working directory like 
`_temporary/0/taskAttempt_0**/_temporary`.
   Each task invoke commitTask to commit its output, for algorithm 1, it would 
commit output to `_temporary/0/taskAttempt_**`, and for algorithm 2, it would 
commit output to outputPath directly.
   See detail in 
https://github.com/apache/hadoop/blob/20eec958674a9c343a80c9fccd1383ef7c1b57f5/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java#L573
   
   But for dynamic partition overwrite, Spark set an unique output named 
`.spark-staging-${UUID}`under tablePath.
   For each task, its output could be 
`.spark-staging-${UUID}/partitionPath1/fileName1`, 
`.spark-staging-${UUID}/partitionPath2/fileName2` ... and  
`.spark-staging-${UUID}/partitionPathn/fileNameN`.
   
   It means that, all tasks share the same working directories.
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-07 Thread GitBox
turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix 
the issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-610699108
 
 
   > If the issue is caused by conflict file name, why it is only specific to 
`dynamicPartitionOverwrite`?
   
   Spark use Hadoop's FileOutputCommitter as OutputCommitter class normally.
   There is an working directory, for spark, it is `_temporary/0'.
   Each task has an unique output, like `_temporary/0/taskAtttempt_0**/'.
   Each task also has working directory like 
`_temporary/0/taskAttempt_0**/_temporary`.
   Each task invoke commitTask to commit its output, for algorithm 1, it would 
commit output to `_temporary/0/taskAttempt_**`, and for algorithm 2, it would 
commit output to outputPath directly.
   See detail in 
   
   
https://github.com/apache/hadoop/blob/20eec958674a9c343a80c9fccd1383ef7c1b57f5/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java#L573
   
   But for dynamic partition overwrite, Spark set an unique output named 
`.spark-staging-${UUID}`under tablePath.
   For each task, its output could be 
`.spark-staging-${UUID}/partitionPath1/fileName1`, 
`.spark-staging-${UUID}/partitionPath2/fileName2` ... and  
`.spark-staging-${UUID}/partitionPathn/fileNamen`.
   
   It means that, all tasks share the same working directories.
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-07 Thread GitBox
turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix 
the issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-610699108
 
 
   > If the issue is caused by conflict file name, why it is only specific to 
`dynamicPartitionOverwrite`?
   
   Spark use Hadoop's FileOutputCommitter as OutputCommitter class normally.
   There is an working directory, for spark, it is `_temporary/0'.
   Each task has an unique output, like `_temporary/0/taskAtttempt_0**/'.
   Each task also has working directory like 
`_temporary/0/taskAttempt_0**/_temporary`.
   Each task invoke commitTask to commit its output, for algorithm 1, it would 
commit output to `_temporary/0/taskAttempt_**`, and for algorithm 2, it would 
commit output to outputPath directly.
   See detail in 
https://github.com/apache/hadoop/blob/20eec958674a9c343a80c9fccd1383ef7c1b57f5/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java#L573
   
   But for dynamic partition overwrite, Spark set an unique output named 
`.spark-staging-${UUID}`under tablePath.
   For each task, its output could be 
`.spark-staging-${UUID}/partitionPath1/fileName1`, 
`.spark-staging-${UUID}/partitionPath2/fileName2` ... and  
`.spark-staging-${UUID}/partitionPathn/fileNamen`.
   
   It means that, all tasks share the same working directories.
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-07 Thread GitBox
turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix 
the issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-610699108
 
 
   > If the issue is caused by conflict file name, why it is only specific to 
`dynamicPartitionOverwrite`?
   
   Spark use Hadoop's FileOutputCommitter as OutputCommitter class normally.
   There is an working directory, for spark, it is `_temporary/0'.
   Each task has an unique output, like `_temporary/0/taskAtttempt_0**/'.
   Each task also has working directory like 
`_temporary/0/taskAttempt_0**/_temporary`.
   Each task invoke commitTask to commit its output, for algorithm 1, it would 
commit output to `_temporary/0/taskAttempt_**`, and for algorithm 2, it would 
commit output to outputPath directly(see detail in 
https://github.com/apache/hadoop/blob/20eec958674a9c343a80c9fccd1383ef7c1b57f5/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java#L573).
   
   But for dynamic partition overwrite, Spark set an unique output named 
`.spark-staging-${UUID}`under tablePath.
   For each task, its output could be 
`.spark-staging-${UUID}/partitionPath1/fileName1`, 
`.spark-staging-${UUID}/partitionPath2/fileName2` ... and  
`.spark-staging-${UUID}/partitionPathn/fileNamen`.
   
   It means that, all tasks share the same working directories.
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-07 Thread GitBox
turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix 
the issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-610361734
 
 
   > I may miss something here but can't we just delete the file if it exists 
at:
   > 
   > 
https://github.com/apache/spark/blob/2ac6163a5d04027ef4dbdf7d031cddf9415ed25e/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L109
   > 
   > 
   > ?
   
   
   
   No, if speculation is not enabled, we can delete it directly, because there 
is no possible to exist two concurrent tasks which has same output file name.
   Two tasks with same taskId but different attempt id would write a same file.
   
https://github.com/apache/spark/blob/2ac6163a5d04027ef4dbdf7d031cddf9415ed25e/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L140-L146
   But when speculation is enabled, if we delete it directly, it would cause 
this task failed(for hdfs, the exception maybe no lease on this inode, for 
local file, I can not ensure the data would whether duplicated,  for other 
filesystem ...), then I think it would also launch another new speculation task.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-07 Thread GitBox
turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix 
the issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-610361734
 
 
   > I may miss something here but can't we just delete the file if it exists 
at:
   > 
   > 
https://github.com/apache/spark/blob/2ac6163a5d04027ef4dbdf7d031cddf9415ed25e/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L109
   > 
   > 
   > ?
   
   
   
   No, if speculation is not enabled, we can delete it directly, because there 
is no possible to exist two concurrent tasks which has same output file name.
   Two tasks with same taskId but different attempt id would write a same file.
   
https://github.com/apache/spark/blob/2ac6163a5d04027ef4dbdf7d031cddf9415ed25e/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L140-L146
   But when speculation is enabled, if we delete it directly, it would cause 
this task failed(for hdfs, the exception maybe no lease on this inode, for 
local file, I can not ensure the data would whether duplicated,  for other 
filesystem ...), then I think it would also launch a new speculation task.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-04 Thread GitBox
turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix 
the issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-608508212
 
 
   tried to add an ut.
   But I can not ensure that a speculation task would be launched. Thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2019-10-31 Thread GitBox
turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix 
the issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-548260473
 
 
   cc @cloud-fan  @advancedxy @viirya  @wangyum 
   Can you help take a look? Thanks in advance.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org