[jira] [Updated] (MAPREDUCE-6336) Enable v2 FileOutputCommitter by default
[ https://issues.apache.org/jira/browse/MAPREDUCE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated MAPREDUCE-6336: - Resolution: Fixed Fix Version/s: 3.0.0 Release Note: mapreduce.fileoutputcommitter.algorithm.version now defaults to 2. In algorithm version 1: 1. commitTask renames directory $joboutput/_temporary/$appAttemptID/_temporary/$taskAttemptID/ to $joboutput/_temporary/$appAttemptID/$taskID/ 2. recoverTask renames $joboutput/_temporary/$appAttemptID/$taskID/ to $joboutput/_temporary/($appAttemptID + 1)/$taskID/ 3. commitJob merges every task output file in $joboutput/_temporary/$appAttemptID/$taskID/ to $joboutput/, then it will delete $joboutput/_temporary/ and write $joboutput/_SUCCESS commitJob's run time, number of RPC, is O(n) in terms of output files, which is discussed in MAPREDUCE-4815, and can take minutes. Algorithm version 2 changes the behavior of commitTask, recoverTask, and commitJob. 1. commitTask renames all files in $joboutput/_temporary/$appAttemptID/_temporary/$taskAttemptID/ to $joboutput/ 2. recoverTask is a nop strictly speaking, but for upgrade from version 1 to version 2 case, it checks if there are any files in $joboutput/_temporary/($appAttemptID - 1)/$taskID/ and renames them to $joboutput/ 3. commitJob deletes $joboutput/_temporary and writes $joboutput/_SUCCESS Algorithm 2 takes advantage of task parallelism and makes commitJob itself O(1). However, the window of vulnerability for having incomplete output in $jobOutput directory is much larger. Therefore, pipeline logic for consuming job outputs should be built on checking for existence of _SUCCESS marker. Hadoop Flags: Incompatible change,Reviewed Status: Resolved (was: Patch Available) Thanks, [~l201514] for contribution, and [~jlowe] for review! Committed to trunk. Enable v2 FileOutputCommitter by default Key: MAPREDUCE-6336 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6336 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 2.7.0 Reporter: Gera Shegalov Assignee: Siqi Li Labels: BB2015-05-TBR Fix For: 3.0.0 Attachments: MAPREDUCE-6336.v1.patch This JIRA is to propose making new FileOutputCommitter behavior from MAPREDUCE-4815 enabled by default in trunk, and potentially in branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6336) Enable v2 FileOutputCommitter by default
[ https://issues.apache.org/jira/browse/MAPREDUCE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6336: Labels: BB2015-05-TBR (was: ) Enable v2 FileOutputCommitter by default Key: MAPREDUCE-6336 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6336 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 2.7.0 Reporter: Gera Shegalov Assignee: Siqi Li Labels: BB2015-05-TBR Attachments: MAPREDUCE-6336.v1.patch This JIRA is to propose making new FileOutputCommitter behavior from MAPREDUCE-4815 enabled by default in trunk, and potentially in branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6336) Enable v2 FileOutputCommitter by default
[ https://issues.apache.org/jira/browse/MAPREDUCE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated MAPREDUCE-6336: --- Attachment: MAPREDUCE-6336.v1.patch Enable v2 FileOutputCommitter by default Key: MAPREDUCE-6336 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6336 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 2.7.0 Reporter: Gera Shegalov Assignee: Siqi Li Attachments: MAPREDUCE-6336.v1.patch This JIRA is to propose making new FileOutputCommitter behavior from MAPREDUCE-4815 enabled by default in trunk, and potentially in branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6336) Enable v2 FileOutputCommitter by default
[ https://issues.apache.org/jira/browse/MAPREDUCE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated MAPREDUCE-6336: --- Status: Patch Available (was: Open) Enable v2 FileOutputCommitter by default Key: MAPREDUCE-6336 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6336 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 2.7.0 Reporter: Gera Shegalov Assignee: Siqi Li Attachments: MAPREDUCE-6336.v1.patch This JIRA is to propose making new FileOutputCommitter behavior from MAPREDUCE-4815 enabled by default in trunk, and potentially in branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6336) Enable v2 FileOutputCommitter by default
[ https://issues.apache.org/jira/browse/MAPREDUCE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated MAPREDUCE-6336: - Assignee: Siqi Li Enable v2 FileOutputCommitter by default Key: MAPREDUCE-6336 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6336 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 2.7.0 Reporter: Gera Shegalov Assignee: Siqi Li This JIRA is to propose making new FileOutputCommitter behavior from MAPREDUCE-4815 enabled by default in trunk, and potentially in branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)