[jira] [Commented] (HADOOP-15107) Stabilize/tune S3A committers; review correctness & docs
[ https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598148#comment-16598148 ] Hudson commented on HADOOP-15107: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14855 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14855/]) HADOOP-15107. Stabilize/tune S3A committers; review correctness & docs. (stevel: rev 5a0babf76550f63dad4c17173c4da2bf335c6532) * (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Invoker.java * (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/S3ACommitterFactory.java * (edit) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/staging/integration/ITestStagingCommitProtocol.java * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/PathOutputCommitter.java * (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/PartitionedStagingCommitter.java * (edit) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/magic/ITestMagicCommitProtocol.java * (edit) hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md * (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/DirectoryStagingCommitter.java * (edit) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/magic/ITMagicCommitMRJob.java * (add) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/staging/integration/ITStagingCommitMRJobBadDest.java * (edit) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractCommitITest.java * (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/Paths.java * (edit) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitMRJob.java * (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java * (edit) hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committer_architecture.md * (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java * (add) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/ITestS3ACommitterFactory.java * (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java * (edit) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitProtocol.java > Stabilize/tune S3A committers; review correctness & docs > > > Key: HADOOP-15107 > URL: https://issues.apache.org/jira/browse/HADOOP-15107 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Fix For: 3.1.2 > > Attachments: HADOOP-15107-001.patch, HADOOP-15107-002.patch, > HADOOP-15107-003.patch, HADOOP-15107-004.patch > > > I'm writing about the paper on the committers, one which, being a proper > paper, requires me to show the committers work. > # define the requirements of a "Correct" committed job (this applies to the > FileOutputCommitter too) > # show that the Staging committer meets these requirements (most of this is > implicit in that it uses the V1 FileOutputCommitter to marshall .pendingset > lists from committed tasks to the final destination, where they are read and > committed. > # Show the magic committer also works. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15107) Stabilize/tune S3A committers; review correctness & docs
[ https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597511#comment-16597511 ] Steve Loughran commented on HADOOP-15107: - Thanks, committed to 3.1.x & trunk! bq. New commit attempts always get the same attempt id +1? (I don't know how those are allocated) its how they know to recover from the previous attempt. Yarn App ID is used to guarantee uniqueness over apps. Spark always starts off with 0 as its app Id/attempt ID, so >1 query from different spark instances can clash. bq. The mergePathsV1 seems pretty straightforward. Not sure why the actual code is so complicated. unplanned evolution is my guess, possibly with a goal of not breaking any explict subclasses of the FileOutputCommitter. I didn't try that for the new commit stuff. v2 resilience? It is broken in that nothing can handle a task which fails during commit: its (non-atomic) state is unknown. Neither MapReduce nor Spark are aware of/resilient to this issue. There's also the partitioning failure mode: task doesn't fail during commit, it merely hangs for a while (GC?) then completes its commit when it resumes, without noticing that it's been superceded by a second task attempt, or indeed, that the entire job has now completed. Oops. Bear in mind though that outside object stores with slow renames the probability of a failure during task commit is likely to low. Really committers should expose their semantics here & MR & Spark can handle this failure condition. V1 doesn't have this problem as the task commit is atomic; job commit is not, but as {{isCommitJobRepeatable()}} returns false for that, MR AM restart knows to give up then (something is saved to HDFS indicate in-job-commit). Spark doesn't restart failed AM/driver, so it's moot there. S3A committers * Staging: relies on V1 semantics in cluster HDFS * Magic. Task commit writes {{PendingSet}} of all files to commit to task in an atomic PUT; task commit is therefore also atomic. After a job completes we purge all pending uploads under $dest, so any failed tasks' output is deleted. > Stabilize/tune S3A committers; review correctness & docs > > > Key: HADOOP-15107 > URL: https://issues.apache.org/jira/browse/HADOOP-15107 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Fix For: 3.1.2 > > Attachments: HADOOP-15107-001.patch, HADOOP-15107-002.patch, > HADOOP-15107-003.patch, HADOOP-15107-004.patch > > > I'm writing about the paper on the committers, one which, being a proper > paper, requires me to show the committers work. > # define the requirements of a "Correct" committed job (this applies to the > FileOutputCommitter too) > # show that the Staging committer meets these requirements (most of this is > implicit in that it uses the V1 FileOutputCommitter to marshall .pendingset > lists from committed tasks to the final destination, where they are read and > committed. > # Show the magic committer also works. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15107) Stabilize/tune S3A committers; review correctness & docs
[ https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596784#comment-16596784 ] Aaron Fabbri commented on HADOOP-15107: --- +1 on the v4 patch. Code all looks good. {noformat} +} else { + LOG.warn("Using standard FileOutputCommitter to commit work." + + " This is slow and potentially unsafe."); + return createFileOutputCommitter(outputPath, context);{noformat} Good call, I like it. On the docs changes, just some random questions: {noformat} ```python def recoverTask(tac): oldAttemptId = appAttemptId - 1 {noformat} Interesting. New commit attempts always get the same attempt id +1? (I don't know how those are allocated) The mergePathsV1 seems pretty straightforward. Not sure why the actual code is so complicated. Your pseudocode representation seems fairly intuitive. Overwriting stuff that exists in the destination, recursively so you don't just nuke directories that exist in the destination, instead descending and removing destination conflicts as they arise (files). Special case if src is file but dest is dir (nuke dest). {noformat} ### v2 Job Recovery Before `commitJob()` Because the data has been renamed into the destination directory, all tasks recorded as having being committed have no recovery needed at all: ```python def recoverTask(tac): ``` All active and queued tasks are scheduled for execution. There is a weakness here, the same one on a failure during `commitTask()`: it is only safe to repeat a task which failed during that commit operation if the name of all generated files are constant across all task attempts. If the Job AM fails while a task attempt has been instructed to commit, and that commit is not recorded as having completed, the state of that in-progress task is unknown...really it isn't be safe to recover the job at this point. {noformat} Interesting. What happens in this case? Is it detected? Do we get duplicate data in the final job (re-attempt) output? > Stabilize/tune S3A committers; review correctness & docs > > > Key: HADOOP-15107 > URL: https://issues.apache.org/jira/browse/HADOOP-15107 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Attachments: HADOOP-15107-001.patch, HADOOP-15107-002.patch, > HADOOP-15107-003.patch, HADOOP-15107-004.patch > > > I'm writing about the paper on the committers, one which, being a proper > paper, requires me to show the committers work. > # define the requirements of a "Correct" committed job (this applies to the > FileOutputCommitter too) > # show that the Staging committer meets these requirements (most of this is > implicit in that it uses the V1 FileOutputCommitter to marshall .pendingset > lists from committed tasks to the final destination, where they are read and > committed. > # Show the magic committer also works. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15107) Stabilize/tune S3A committers; review correctness & docs
[ https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596683#comment-16596683 ] Aaron Fabbri commented on HADOOP-15107: --- I don't want to rob others of the joys of learning the new committers, but I can review the code (patch) today. > Stabilize/tune S3A committers; review correctness & docs > > > Key: HADOOP-15107 > URL: https://issues.apache.org/jira/browse/HADOOP-15107 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Attachments: HADOOP-15107-001.patch, HADOOP-15107-002.patch, > HADOOP-15107-003.patch, HADOOP-15107-004.patch > > > I'm writing about the paper on the committers, one which, being a proper > paper, requires me to show the committers work. > # define the requirements of a "Correct" committed job (this applies to the > FileOutputCommitter too) > # show that the Staging committer meets these requirements (most of this is > implicit in that it uses the V1 FileOutputCommitter to marshall .pendingset > lists from committed tasks to the final destination, where they are read and > committed. > # Show the magic committer also works. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15107) Stabilize/tune S3A committers; review correctness & docs
[ https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596618#comment-16596618 ] Steve Loughran commented on HADOOP-15107: - I want this to go in to Hadoop 3.2; there's no significant change in semantics here other than better resilience, error reporting and a quieter abort phase. Can I get some reviews? This is a great opportunity to learn about the commit mechanism > Stabilize/tune S3A committers; review correctness & docs > > > Key: HADOOP-15107 > URL: https://issues.apache.org/jira/browse/HADOOP-15107 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Attachments: HADOOP-15107-001.patch, HADOOP-15107-002.patch, > HADOOP-15107-003.patch, HADOOP-15107-004.patch > > > I'm writing about the paper on the committers, one which, being a proper > paper, requires me to show the committers work. > # define the requirements of a "Correct" committed job (this applies to the > FileOutputCommitter too) > # show that the Staging committer meets these requirements (most of this is > implicit in that it uses the V1 FileOutputCommitter to marshall .pendingset > lists from committed tasks to the final destination, where they are read and > committed. > # Show the magic committer also works. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15107) Stabilize/tune S3A committers; review correctness & docs
[ https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569843#comment-16569843 ] genericqa commented on HADOOP-15107: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 8 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 45s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 12s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 7 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 3s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 49s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 27s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}129m 25s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | HADOOP-15107 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12934423/HADOOP-15107-004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux b727c202a825 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / bcfc985 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | whitespace |
[jira] [Commented] (HADOOP-15107) Stabilize/tune S3A committers; review correctness & docs
[ https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567754#comment-16567754 ] genericqa commented on HADOOP-15107: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 8 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 41s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 50s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 7 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 6s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 49s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 37s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}127m 24s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | HADOOP-15107 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12934192/HADOOP-15107-003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 55a50e6cd99b 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 40ab8ee | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | whitespace |