[jira] [Updated] (MAPREDUCE-6336) Enable v2 FileOutputCommitter by default

2015-05-27 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated MAPREDUCE-6336:
-
   Resolution: Fixed
Fix Version/s: 3.0.0
 Release Note: 
mapreduce.fileoutputcommitter.algorithm.version now defaults to 2.
  
In algorithm version 1:

  1. commitTask renames directory
  $joboutput/_temporary/$appAttemptID/_temporary/$taskAttemptID/
  to
  $joboutput/_temporary/$appAttemptID/$taskID/

  2. recoverTask renames
  $joboutput/_temporary/$appAttemptID/$taskID/
  to
  $joboutput/_temporary/($appAttemptID + 1)/$taskID/

  3. commitJob merges every task output file in
  $joboutput/_temporary/$appAttemptID/$taskID/
  to
  $joboutput/, then it will delete $joboutput/_temporary/
  and write $joboutput/_SUCCESS

commitJob's run time, number of RPC, is O(n) in terms of output files, which is 
discussed in MAPREDUCE-4815, and can take minutes. 

Algorithm version 2 changes the behavior of commitTask, recoverTask, and 
commitJob.

  1. commitTask renames all files in
  $joboutput/_temporary/$appAttemptID/_temporary/$taskAttemptID/
  to $joboutput/

  2. recoverTask is a nop strictly speaking, but for
  upgrade from version 1 to version 2 case, it checks if there
  are any files in
  $joboutput/_temporary/($appAttemptID - 1)/$taskID/
  and renames them to $joboutput/

  3. commitJob deletes $joboutput/_temporary and writes
  $joboutput/_SUCCESS

Algorithm 2 takes advantage of task parallelism and makes commitJob itself 
O(1). However, the window of vulnerability for having incomplete output in 
$jobOutput directory is much larger. Therefore, pipeline logic for consuming 
job outputs should be built on checking for existence of _SUCCESS marker.
 Hadoop Flags: Incompatible change,Reviewed
   Status: Resolved  (was: Patch Available)

Thanks, [~l201514] for contribution, and [~jlowe] for review! Committed to 
trunk.

 Enable v2 FileOutputCommitter by default
 

 Key: MAPREDUCE-6336
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6336
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 2.7.0
Reporter: Gera Shegalov
Assignee: Siqi Li
  Labels: BB2015-05-TBR
 Fix For: 3.0.0

 Attachments: MAPREDUCE-6336.v1.patch


 This JIRA is to propose making new FileOutputCommitter behavior from 
 MAPREDUCE-4815 enabled by default in trunk, and potentially in branch-2. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6336) Enable v2 FileOutputCommitter by default

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6336:

Labels: BB2015-05-TBR  (was: )

 Enable v2 FileOutputCommitter by default
 

 Key: MAPREDUCE-6336
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6336
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 2.7.0
Reporter: Gera Shegalov
Assignee: Siqi Li
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6336.v1.patch


 This JIRA is to propose making new FileOutputCommitter behavior from 
 MAPREDUCE-4815 enabled by default in trunk, and potentially in branch-2. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6336) Enable v2 FileOutputCommitter by default

2015-04-24 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-6336:
---
Attachment: MAPREDUCE-6336.v1.patch

 Enable v2 FileOutputCommitter by default
 

 Key: MAPREDUCE-6336
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6336
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 2.7.0
Reporter: Gera Shegalov
Assignee: Siqi Li
 Attachments: MAPREDUCE-6336.v1.patch


 This JIRA is to propose making new FileOutputCommitter behavior from 
 MAPREDUCE-4815 enabled by default in trunk, and potentially in branch-2. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6336) Enable v2 FileOutputCommitter by default

2015-04-24 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-6336:
---
Status: Patch Available  (was: Open)

 Enable v2 FileOutputCommitter by default
 

 Key: MAPREDUCE-6336
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6336
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 2.7.0
Reporter: Gera Shegalov
Assignee: Siqi Li
 Attachments: MAPREDUCE-6336.v1.patch


 This JIRA is to propose making new FileOutputCommitter behavior from 
 MAPREDUCE-4815 enabled by default in trunk, and potentially in branch-2. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6336) Enable v2 FileOutputCommitter by default

2015-04-23 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated MAPREDUCE-6336:
-
Assignee: Siqi Li

 Enable v2 FileOutputCommitter by default
 

 Key: MAPREDUCE-6336
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6336
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 2.7.0
Reporter: Gera Shegalov
Assignee: Siqi Li

 This JIRA is to propose making new FileOutputCommitter behavior from 
 MAPREDUCE-4815 enabled by default in trunk, and potentially in branch-2. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)