steveloughran commented on pull request #33828: URL: https://github.com/apache/spark/pull/33828#issuecomment-1064041710
* propose using ` "spark.sql.sources.writeJobUUID` as the job id when set; more uniqueness and it should be set everywhere. * core design looks ok. but i don't see why you couldn't support concurrent jobs just by having different subdirs of __temporary for different job IDs/UUIDs, and an option to disable cleanup. (and instructions to do it later, which you'd need to do anyway). * because that use of `__temporary/0` on file output committer is only because on a restart of the MR AM lets the committer use `__temporary/1` (using app attempt number for the subdir) then moving the committed task data from job attempt 0 to its own dir, so recover all existing work. spark doesn't need that. * it'd be good for you to try out my manifest committer against hdfs with your workloads. it is designed to be a lot faster in job commit because all listing of task output directory trees is done in task commit, and job commit does everything in parallel (listing of manifests, loading of manifests, creating dest dirs, file rename). some of the options you don't need for hdfs (parallel delete of task attempt temp dirs)j, but I still expect a massive speedup of job commit, though not as much as for stores where listing and rename is slower. The reason i don't explicitly target HDFS is it means I can cut out that testing/QE and focus on abfs and gcs, using benchmarks from there to tune the algorithm. For example it turns out that mkdirs on gcs is slow so you should check for existence first; that is now done in task commits, which adds duplicate probes in task commit, but there, knowing abfs does async page prefetch on a 'listStatusIterator()` call, i can do the `getFileStatus(destDir)` call after making the list call and have it done while the first page of list results is coming in. https://github.com/steveloughran/hadoop/blob/mr/MAPREDUCE-7341-manifest-committer/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/committer/manifest/stages/TaskAttemptScanDirectoryStage.java#L150 numbers for HDFS would only distract me, but you will see much faster parallel job commits on "real world" partitioned trees -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
