Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/15562#discussion_r84220219
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteOutput.scala
---
@@ -408,17 +416,6 @@ object WriteOutput extends Logging {
job.getConfiguration.setBoolean("mapred.task.is.map", true)
job.getConfiguration.setInt("mapred.task.partition", 0)
- // This UUID is sent to executor side together with the serialized
`Configuration` object within
- // the `Job` instance. `OutputWriters` on the executor side should
use this UUID to generate
- // unique task output files.
- // This UUID is used to avoid output file name collision between
different appending write jobs.
- // These jobs may belong to different SparkContext instances. Concrete
data source
- // implementations may use this UUID to generate unique file names
(e.g.,
- // `part-r-<task-id>-<job-uuid>.parquet`). The reason why this ID is
used to identify a job
- // rather than a single task output file is that, speculative tasks
must generate the same
- // output file name as the original task.
--- End diff --
We should probably preserve this comment and move it to the new place where
we generate the UUID.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]