[jira] [Commented] (MAPREDUCE-5956) MapReduce AM should not use maxAttempts to determine if this is the last retry

2014-07-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054562#comment-14054562
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5956:


Here's what I think are potential solutions and their problems
# YARN informs AM that it is the last retry as part of AM start-up or the 
register API
# YARN informs the AM that this is the last retry as part of AM unregister
# YARN has a way to run a separate cleanup container after it knows for sure 
that the application finished exhausting all its attempts

(1) is not really possible. At best, RM can say that this 
'mayBeTheLastAttempt'. So AM cannot really assume that this is the last retry 
and so cannot do stuff like cleaning the staging directory.

(2) is fine enough for successful code-path. In fact, we already have a way of 
telling the AM that unregister succeeded and that this indeed is the last 
retry. We don't need a new API. If RM crashed/failed-over before that, app will 
have a new retry anyways. Downside of this approach is that, there are so many 
cases where app's last retry may have crashed (say OOM) and so doesn't cleanup 
stale files. In fact, any solution that relies on such RM-AM communication will 
not really solve those corner cases.

(3) is an acknowledgement of the fact that a solution to the problem of cleanup 
of stale-files is not possible without explicit help from RM. The more I think, 
the more it appears to me that this is the right solution. Filing a ticket, but 
this will take a while and so we may have to just do (2) for the time being..

 MapReduce AM should not use maxAttempts to determine if this is the last retry
 --

 Key: MAPREDUCE-5956
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5956
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mrv2
Reporter: Vinod Kumar Vavilapalli
Assignee: Wangda Tan
Priority: Blocker

 Found this while reviewing YARN-2074. The problem is that after YARN-2074, we 
 don't count AM preemption towards AM failures on RM side, but MapReduce AM 
 itself checks the attempt id against the max-attempt count to determine if 
 this is the last attempt.
 {code}
 public void computeIsLastAMRetry() {
   isLastAMRetry = appAttemptID.getAttemptId() = maxAppAttempts;
 }
 {code}
 This causes issues w.r.t deletion of staging directory etc..



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5956) MapReduce AM should not use maxAttempts to determine if this is the last retry

2014-07-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054565#comment-14054565
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5956:


bq. Filing a ticket, but this will take a while and so we may have to just do 
(2) for the time being..
Filed YARN-2261.

 MapReduce AM should not use maxAttempts to determine if this is the last retry
 --

 Key: MAPREDUCE-5956
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5956
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mrv2
Reporter: Vinod Kumar Vavilapalli
Assignee: Wangda Tan
Priority: Blocker

 Found this while reviewing YARN-2074. The problem is that after YARN-2074, we 
 don't count AM preemption towards AM failures on RM side, but MapReduce AM 
 itself checks the attempt id against the max-attempt count to determine if 
 this is the last attempt.
 {code}
 public void computeIsLastAMRetry() {
   isLastAMRetry = appAttemptID.getAttemptId() = maxAppAttempts;
 }
 {code}
 This causes issues w.r.t deletion of staging directory etc..



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb

2014-07-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054848#comment-14054848
 ] 

Hudson commented on MAPREDUCE-5517:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk #607 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/607/])
MAPREDUCE-5517. Fixed MapReduce ApplicationMaster to not validate reduce side 
resource configuration for deciding uber-mode on map-only jobs. Contributed by 
Siqi Li. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608595)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java


 enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb 
 to be less than yarn.app.mapreduce.am.resource.mb
 -

 Key: MAPREDUCE-5517
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Minor
 Fix For: 2.5.0

 Attachments: MAPREDUCE_5517_v3.patch.txt, 
 MAPREDUCE_5517_v4.patch.txt, MAPREDUCE_5517_v5.patch, MAPREDUCE_5517_v6.patch


 Since there is no reducer, the memory allocated to reducer is irrelevant to 
 enable uber mode of a job



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5868) TestPipeApplication causing nightly build to fail

2014-07-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054849#comment-14054849
 ] 

Hudson commented on MAPREDUCE-5868:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk #607 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/607/])
MAPREDUCE-5868. Fixed an issue with TestPipeApplication that was causing the 
nightly builds to fail. Contributed by Akira Ajisaka. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608579)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/pipes/TestPipeApplication.java


 TestPipeApplication causing nightly build to fail
 -

 Key: MAPREDUCE-5868
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5868
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: trunk
Reporter: Jason Lowe
Assignee: Akira AJISAKA
 Fix For: 2.5.0

 Attachments: MAPREDUCE-5868.2.patch, MAPREDUCE-5868.3.patch, 
 MAPREDUCE-5868.4.patch, TestPipeApplication.stack, jstack.log, 
 mapreduce-5868-v1.txt


 TestPipeApplication appears to be timing out which causes the nightly build 
 to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5866) TestFixedLengthInputFormat fails in windows

2014-07-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054850#comment-14054850
 ] 

Hudson commented on MAPREDUCE-5866:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk #607 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/607/])
MAPREDUCE-5866. TestFixedLengthInputFormat fails in windows. Contributed by 
Varun Vasudev. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608585)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestFixedLengthInputFormat.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestFixedLengthInputFormat.java


 TestFixedLengthInputFormat fails in windows
 ---

 Key: MAPREDUCE-5866
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5866
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: client, test
Affects Versions: 3.0.0, 2.4.0
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 3.0.0, 2.6.0

 Attachments: apache-mapreduce-5866.1.patch, apache-yarn-1992.0.patch


 org.apache.hadoop.mapred.TextFixedLengthInputFormat and 
 org.apache.hadoop.mapreduce.lib.input.TestFixedLengthInputFormat tests fail 
 in Windows



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5868) TestPipeApplication causing nightly build to fail

2014-07-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054958#comment-14054958
 ] 

Hudson commented on MAPREDUCE-5868:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #1798 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1798/])
MAPREDUCE-5868. Fixed an issue with TestPipeApplication that was causing the 
nightly builds to fail. Contributed by Akira Ajisaka. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608579)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/pipes/TestPipeApplication.java


 TestPipeApplication causing nightly build to fail
 -

 Key: MAPREDUCE-5868
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5868
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: trunk
Reporter: Jason Lowe
Assignee: Akira AJISAKA
 Fix For: 2.5.0

 Attachments: MAPREDUCE-5868.2.patch, MAPREDUCE-5868.3.patch, 
 MAPREDUCE-5868.4.patch, TestPipeApplication.stack, jstack.log, 
 mapreduce-5868-v1.txt


 TestPipeApplication appears to be timing out which causes the nightly build 
 to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5866) TestFixedLengthInputFormat fails in windows

2014-07-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054959#comment-14054959
 ] 

Hudson commented on MAPREDUCE-5866:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #1798 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1798/])
MAPREDUCE-5866. TestFixedLengthInputFormat fails in windows. Contributed by 
Varun Vasudev. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608585)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestFixedLengthInputFormat.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestFixedLengthInputFormat.java


 TestFixedLengthInputFormat fails in windows
 ---

 Key: MAPREDUCE-5866
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5866
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: client, test
Affects Versions: 3.0.0, 2.4.0
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 3.0.0, 2.6.0

 Attachments: apache-mapreduce-5866.1.patch, apache-yarn-1992.0.patch


 org.apache.hadoop.mapred.TextFixedLengthInputFormat and 
 org.apache.hadoop.mapreduce.lib.input.TestFixedLengthInputFormat tests fail 
 in Windows



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb

2014-07-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054957#comment-14054957
 ] 

Hudson commented on MAPREDUCE-5517:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #1798 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1798/])
MAPREDUCE-5517. Fixed MapReduce ApplicationMaster to not validate reduce side 
resource configuration for deciding uber-mode on map-only jobs. Contributed by 
Siqi Li. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608595)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java


 enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb 
 to be less than yarn.app.mapreduce.am.resource.mb
 -

 Key: MAPREDUCE-5517
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Minor
 Fix For: 2.5.0

 Attachments: MAPREDUCE_5517_v3.patch.txt, 
 MAPREDUCE_5517_v4.patch.txt, MAPREDUCE_5517_v5.patch, MAPREDUCE_5517_v6.patch


 Since there is no reducer, the memory allocated to reducer is irrelevant to 
 enable uber mode of a job



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5866) TestFixedLengthInputFormat fails in windows

2014-07-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055019#comment-14055019
 ] 

Hudson commented on MAPREDUCE-5866:
---

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1825 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1825/])
MAPREDUCE-5866. TestFixedLengthInputFormat fails in windows. Contributed by 
Varun Vasudev. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608585)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestFixedLengthInputFormat.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestFixedLengthInputFormat.java


 TestFixedLengthInputFormat fails in windows
 ---

 Key: MAPREDUCE-5866
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5866
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: client, test
Affects Versions: 3.0.0, 2.4.0
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 3.0.0, 2.6.0

 Attachments: apache-mapreduce-5866.1.patch, apache-yarn-1992.0.patch


 org.apache.hadoop.mapred.TextFixedLengthInputFormat and 
 org.apache.hadoop.mapreduce.lib.input.TestFixedLengthInputFormat tests fail 
 in Windows



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb

2014-07-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055017#comment-14055017
 ] 

Hudson commented on MAPREDUCE-5517:
---

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1825 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1825/])
MAPREDUCE-5517. Fixed MapReduce ApplicationMaster to not validate reduce side 
resource configuration for deciding uber-mode on map-only jobs. Contributed by 
Siqi Li. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608595)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java


 enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb 
 to be less than yarn.app.mapreduce.am.resource.mb
 -

 Key: MAPREDUCE-5517
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Minor
 Fix For: 2.5.0

 Attachments: MAPREDUCE_5517_v3.patch.txt, 
 MAPREDUCE_5517_v4.patch.txt, MAPREDUCE_5517_v5.patch, MAPREDUCE_5517_v6.patch


 Since there is no reducer, the memory allocated to reducer is irrelevant to 
 enable uber mode of a job



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5868) TestPipeApplication causing nightly build to fail

2014-07-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055018#comment-14055018
 ] 

Hudson commented on MAPREDUCE-5868:
---

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1825 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1825/])
MAPREDUCE-5868. Fixed an issue with TestPipeApplication that was causing the 
nightly builds to fail. Contributed by Akira Ajisaka. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608579)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/pipes/TestPipeApplication.java


 TestPipeApplication causing nightly build to fail
 -

 Key: MAPREDUCE-5868
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5868
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: trunk
Reporter: Jason Lowe
Assignee: Akira AJISAKA
 Fix For: 2.5.0

 Attachments: MAPREDUCE-5868.2.patch, MAPREDUCE-5868.3.patch, 
 MAPREDUCE-5868.4.patch, TestPipeApplication.stack, jstack.log, 
 mapreduce-5868-v1.txt


 TestPipeApplication appears to be timing out which causes the nightly build 
 to fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5885) build/test/test.mapred.spill causes release audit warnings

2014-07-08 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated MAPREDUCE-5885:
---

Attachment: MAPREDUCE-5885.patch

retrigger QA

 build/test/test.mapred.spill causes release audit warnings
 --

 Key: MAPREDUCE-5885
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5885
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: trunk
Reporter: Jason Lowe
Assignee: Chen He
 Attachments: MAPREDUCE-5885.patch, MAPREDUCE-5885.patch, 
 MAPREDUCE-5885.patch


 Multiple unit tests are creating files under 
 hadoop-mapreduce-client-jobclient/build/test/test.mapred.spill which are 
 causing release audit warnings during Jenkins patch precommit builds.  In 
 addition to being in a poor location for test output and not cleaning up 
 after the test, there are multiple tests using this location which will cause 
 conflicts if tests are run in parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (MAPREDUCE-5962) Support CRC32C in IFile

2014-07-08 Thread James Thomas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Thomas reassigned MAPREDUCE-5962:
---

Assignee: James Thomas

 Support CRC32C in IFile
 ---

 Key: MAPREDUCE-5962
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5962
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: performance, task
Affects Versions: 2.5.0
Reporter: Todd Lipcon
Assignee: James Thomas

 Currently, the IFile format used by the MR shuffle checksums all data using 
 the zlib CRC32 polynomial. If we allow use of CRC32C instead, we can get a 
 large reduction in CPU usage by leveraging the native hardware CRC32C 
 implementation (approx half a second of CPU time savings per GB checksummed).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5885) build/test/test.mapred.spill causes release audit warnings

2014-07-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055249#comment-14055249
 ] 

Hadoop QA commented on MAPREDUCE-5885:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12654608/MAPREDUCE-5885.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4721//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4721//console

This message is automatically generated.

 build/test/test.mapred.spill causes release audit warnings
 --

 Key: MAPREDUCE-5885
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5885
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: trunk
Reporter: Jason Lowe
Assignee: Chen He
 Attachments: MAPREDUCE-5885.patch, MAPREDUCE-5885.patch, 
 MAPREDUCE-5885.patch


 Multiple unit tests are creating files under 
 hadoop-mapreduce-client-jobclient/build/test/test.mapred.spill which are 
 causing release audit warnings during Jenkins patch precommit builds.  In 
 addition to being in a poor location for test output and not cleaning up 
 after the test, there are multiple tests using this location which will cause 
 conflicts if tests are run in parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5956) MapReduce AM should not use maxAttempts to determine if this is the last retry

2014-07-08 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055396#comment-14055396
 ] 

Hitesh Shah commented on MAPREDUCE-5956:


[~vinodkv] By definition, if an AM calls unregister, it is telling the RM that 
this is my last attempt and the app should not be retried. Are now you saying 
that all attempts should now call unregisterAttempt() which will tell the app 
whether it is the final attempt and should call a final unregister()? If not, I 
think something else is needed as an AM will only call unregister() on an error 
if it thinks it is the last attempt. 

 

 MapReduce AM should not use maxAttempts to determine if this is the last retry
 --

 Key: MAPREDUCE-5956
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5956
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mrv2
Reporter: Vinod Kumar Vavilapalli
Assignee: Wangda Tan
Priority: Blocker

 Found this while reviewing YARN-2074. The problem is that after YARN-2074, we 
 don't count AM preemption towards AM failures on RM side, but MapReduce AM 
 itself checks the attempt id against the max-attempt count to determine if 
 this is the last attempt.
 {code}
 public void computeIsLastAMRetry() {
   isLastAMRetry = appAttemptID.getAttemptId() = maxAppAttempts;
 }
 {code}
 This causes issues w.r.t deletion of staging directory etc..



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5956) MapReduce AM should not use maxAttempts to determine if this is the last retry

2014-07-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055422#comment-14055422
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5956:


(AMRMClient|AMRMClientAsync).unregisterApplicationMaster() is a blocking call. 
If any attempt calls this API, and it succeeds, this AM is the last retry - the 
AM can go ahead and do its cleanup. All other attempts (which either don't call 
this API or which failed before the API returned) do not need to do any cleanup 
- of course there are corner cases where this is not sufficient.

For that and all the failing cases, the only comprehensive solution I can think 
of is YARN-2261.

 MapReduce AM should not use maxAttempts to determine if this is the last retry
 --

 Key: MAPREDUCE-5956
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5956
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mrv2
Reporter: Vinod Kumar Vavilapalli
Assignee: Wangda Tan
Priority: Blocker

 Found this while reviewing YARN-2074. The problem is that after YARN-2074, we 
 don't count AM preemption towards AM failures on RM side, but MapReduce AM 
 itself checks the attempt id against the max-attempt count to determine if 
 this is the last attempt.
 {code}
 public void computeIsLastAMRetry() {
   isLastAMRetry = appAttemptID.getAttemptId() = maxAppAttempts;
 }
 {code}
 This causes issues w.r.t deletion of staging directory etc..



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-2094) org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour.

2014-07-08 Thread Gian Merlino (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gian Merlino updated MAPREDUCE-2094:


Attachment: MAPREDUCE-2094-FileInputFormat-docs.patch

We just hit this bug too. I'd really prefer a safer default (like return 
false or something like Niels's patch) but if that is not doable, better docs 
would have helped. The current docs imply that the default implementation 
handles splittable vs non-splittable files, even though it doesn't.

I've attached some wording that I think is more clear.

 org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements 
 unsafe default behaviour that is different from the documented behaviour.
 ---

 Key: MAPREDUCE-2094
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2094
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Niels Basjes
Assignee: Niels Basjes
 Attachments: MAPREDUCE-2094-2011-05-19.patch, 
 MAPREDUCE-2094-FileInputFormat-docs.patch


 When implementing a custom derivative of FileInputFormat we ran into the 
 effect that a large Gzipped input file would be processed several times. 
 A near 1GiB file would be processed around 36 times in its entirety. Thus 
 producing garbage results and taking up a lot more CPU time than needed.
 It took a while to figure out and what we found is that the default 
 implementation of the isSplittable method in 
 [org.apache.hadoop.mapreduce.lib.input.FileInputFormat | 
 http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java?view=markup
  ] is simply return true;. 
 This is a very unsafe default and is in contradiction with the JavaDoc of the 
 method which states: Is the given filename splitable? Usually, true, but if 
 the file is stream compressed, it will not be.  . The actual implementation 
 effectively does Is the given filename splitable? Always true, even if the 
 file is stream compressed using an unsplittable compression codec. 
 For our situation (where we always have Gzipped input) we took the easy way 
 out and simply implemented an isSplittable in our class that does return 
 false; 
 Now there are essentially 3 ways I can think of for fixing this (in order of 
 what I would find preferable):
 # Implement something that looks at the used compression of the file (i.e. do 
 migrate the implementation from TextInputFormat to FileInputFormat). This 
 would make the method do what the JavaDoc describes.
 # Force developers to think about it and make this method abstract.
 # Use a safe default (i.e. return false)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-2094) org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour.

2014-07-08 Thread Gian Merlino (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gian Merlino updated MAPREDUCE-2094:


Attachment: (was: MAPREDUCE-2094-FileInputFormat-docs.patch)

 org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements 
 unsafe default behaviour that is different from the documented behaviour.
 ---

 Key: MAPREDUCE-2094
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2094
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Niels Basjes
Assignee: Niels Basjes
 Attachments: MAPREDUCE-2094-2011-05-19.patch


 When implementing a custom derivative of FileInputFormat we ran into the 
 effect that a large Gzipped input file would be processed several times. 
 A near 1GiB file would be processed around 36 times in its entirety. Thus 
 producing garbage results and taking up a lot more CPU time than needed.
 It took a while to figure out and what we found is that the default 
 implementation of the isSplittable method in 
 [org.apache.hadoop.mapreduce.lib.input.FileInputFormat | 
 http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java?view=markup
  ] is simply return true;. 
 This is a very unsafe default and is in contradiction with the JavaDoc of the 
 method which states: Is the given filename splitable? Usually, true, but if 
 the file is stream compressed, it will not be.  . The actual implementation 
 effectively does Is the given filename splitable? Always true, even if the 
 file is stream compressed using an unsplittable compression codec. 
 For our situation (where we always have Gzipped input) we took the easy way 
 out and simply implemented an isSplittable in our class that does return 
 false; 
 Now there are essentially 3 ways I can think of for fixing this (in order of 
 what I would find preferable):
 # Implement something that looks at the used compression of the file (i.e. do 
 migrate the implementation from TextInputFormat to FileInputFormat). This 
 would make the method do what the JavaDoc describes.
 # Force developers to think about it and make this method abstract.
 # Use a safe default (i.e. return false)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-07-08 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055613#comment-14055613
 ] 

Chris Douglas commented on MAPREDUCE-5890:
--

Yes, I'm OK with the current patch. This approach won't scale to another 
feature, but it can be preserved in a refactoring.

My only remaining ask (fine to add during commit) is that {{CryptoUtils}} be 
annotated with {{@Private}} and {{@Unstable}}, so it's clearly marked as an 
implementation detail. If it could be package-private that would be even 
better, though I haven't checked to see if there's anything else in the 
{{o.a.h.mapreduce.task.crypto}} package.

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, 
 MAPREDUCE-5890.12.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, 
 MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, 
 MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-2094) org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour.

2014-07-08 Thread Gian Merlino (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gian Merlino updated MAPREDUCE-2094:


Attachment: MAPREDUCE-2094-FileInputFormat-docs-v2.patch

I just attached a different patch that also adjusts the class-level javadocs in 
addition to the methods.

 org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements 
 unsafe default behaviour that is different from the documented behaviour.
 ---

 Key: MAPREDUCE-2094
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2094
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Niels Basjes
Assignee: Niels Basjes
 Attachments: MAPREDUCE-2094-2011-05-19.patch, 
 MAPREDUCE-2094-FileInputFormat-docs-v2.patch


 When implementing a custom derivative of FileInputFormat we ran into the 
 effect that a large Gzipped input file would be processed several times. 
 A near 1GiB file would be processed around 36 times in its entirety. Thus 
 producing garbage results and taking up a lot more CPU time than needed.
 It took a while to figure out and what we found is that the default 
 implementation of the isSplittable method in 
 [org.apache.hadoop.mapreduce.lib.input.FileInputFormat | 
 http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java?view=markup
  ] is simply return true;. 
 This is a very unsafe default and is in contradiction with the JavaDoc of the 
 method which states: Is the given filename splitable? Usually, true, but if 
 the file is stream compressed, it will not be.  . The actual implementation 
 effectively does Is the given filename splitable? Always true, even if the 
 file is stream compressed using an unsplittable compression codec. 
 For our situation (where we always have Gzipped input) we took the easy way 
 out and simply implemented an isSplittable in our class that does return 
 false; 
 Now there are essentially 3 ways I can think of for fixing this (in order of 
 what I would find preferable):
 # Implement something that looks at the used compression of the file (i.e. do 
 migrate the implementation from TextInputFormat to FileInputFormat). This 
 would make the method do what the JavaDoc describes.
 # Force developers to think about it and make this method abstract.
 # Use a safe default (i.e. return false)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-07-08 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated MAPREDUCE-5890:
---

Attachment: MAPREDUCE-5890.13.patch

Uploaded updated patch.. Thanks [~chris.douglas] for all the feedback !!
I've maked the class {{Private}} and {{Unstable}} but can't make the class 
itself package protected since it exposes public static methods used in a 
number of places..

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, 
 MAPREDUCE-5890.12.patch, MAPREDUCE-5890.13.patch, MAPREDUCE-5890.3.patch, 
 MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, 
 MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-07-08 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055718#comment-14055718
 ] 

Chris Douglas commented on MAPREDUCE-5890:
--

Sorry, I meant that if {{o.a.h.mapreduce.task.crypto}} only has {{CryptoUtils}} 
in it, then maybe the new package isn't necessary.

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, 
 MAPREDUCE-5890.12.patch, MAPREDUCE-5890.13.patch, MAPREDUCE-5890.3.patch, 
 MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, 
 MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5963) ShuffleHandler DB schema should be versioned with compatible/incompatible changes

2014-07-08 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-5963:
--

Issue Type: Sub-task  (was: New Feature)
Parent: MAPREDUCE-4150

 ShuffleHandler DB schema should be versioned with compatible/incompatible 
 changes
 -

 Key: MAPREDUCE-5963
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5963
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du

 ShuffleHandler persist job shuffle info into DB schema, which should be 
 versioned with compatible/incompatible changes to support rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Moved] (MAPREDUCE-5963) ShuffleHandler DB schema should be versioned with compatible/incompatible changes

2014-07-08 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du moved YARN-2265 to MAPREDUCE-5963:
-

  Component/s: (was: nodemanager)
 Target Version/s: 2.6.0  (was: 2.6.0)
Affects Version/s: (was: 2.4.1)
   2.4.1
  Key: MAPREDUCE-5963  (was: YARN-2265)
  Project: Hadoop Map/Reduce  (was: Hadoop YARN)

 ShuffleHandler DB schema should be versioned with compatible/incompatible 
 changes
 -

 Key: MAPREDUCE-5963
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5963
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du

 ShuffleHandler persist job shuffle info into DB schema, which should be 
 versioned with compatible/incompatible changes to support rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.2#6252)