[jira] [Updated] (MAPREDUCE-434) local map-reduce job limited to single reducer

2013-08-06 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-434:
-

Issue Type: Improvement  (was: Bug)

 local map-reduce job limited to single reducer
 --

 Key: MAPREDUCE-434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
 Environment: local job tracker
Reporter: Yoram Arnon
Assignee: Aaron Kimball
Priority: Minor
 Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch, 
 MAPREDUCE-434.4.patch, MAPREDUCE-434.5.patch, MAPREDUCE-434.6.patch, 
 MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, 
 MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch


 when mapred.job.tracker is set to 'local', my setNumReduceTasks call is 
 ignored, and the number of reduce tasks is set at 1.
 This prevents me from locally debugging my partition function, which tries to 
 partition based on the number of reduce tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-434) LocalJobRunner limited to single reducer

2013-08-06 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-434:
-

Summary: LocalJobRunner limited to single reducer  (was: local map-reduce 
job limited to single reducer)

 LocalJobRunner limited to single reducer
 

 Key: MAPREDUCE-434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
 Environment: local job tracker
Reporter: Yoram Arnon
Assignee: Aaron Kimball
Priority: Minor
 Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch, 
 MAPREDUCE-434.4.patch, MAPREDUCE-434.5.patch, MAPREDUCE-434.6.patch, 
 MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, 
 MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch


 when mapred.job.tracker is set to 'local', my setNumReduceTasks call is 
 ignored, and the number of reduce tasks is set at 1.
 This prevents me from locally debugging my partition function, which tries to 
 partition based on the number of reduce tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-434) LocalJobRunner limited to single reducer

2013-08-06 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-434:
-

   Resolution: Fixed
Fix Version/s: 2.3.0
   Status: Resolved  (was: Patch Available)

Thanks Aaron and Tom for the review. Committed this to trunk and branch-2.

 LocalJobRunner limited to single reducer
 

 Key: MAPREDUCE-434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
 Environment: local job tracker
Reporter: Yoram Arnon
Assignee: Aaron Kimball
Priority: Minor
 Fix For: 2.3.0

 Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch, 
 MAPREDUCE-434.4.patch, MAPREDUCE-434.5.patch, MAPREDUCE-434.6.patch, 
 MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, 
 MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch


 when mapred.job.tracker is set to 'local', my setNumReduceTasks call is 
 ignored, and the number of reduce tasks is set at 1.
 This prevents me from locally debugging my partition function, which tries to 
 partition based on the number of reduce tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-434) LocalJobRunner limited to single reducer

2013-08-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730456#comment-13730456
 ] 

Hudson commented on MAPREDUCE-434:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4219 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4219/])
MAPREDUCE-434. LocalJobRunner limited to single reducer (Sandy Ryza and Aaron 
Kimball via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510866)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalJobRunner.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/ReduceTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/ShuffleConsumerPlugin.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/LocalFetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/TestShufflePlugin.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestJobCounters.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/lib/TestKeyFieldBasedComparator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLocalRunner.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestMRKeyFieldBasedComparator.java


 LocalJobRunner limited to single reducer
 

 Key: MAPREDUCE-434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
 Environment: local job tracker
Reporter: Yoram Arnon
Assignee: Aaron Kimball
Priority: Minor
 Fix For: 2.3.0

 Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch, 
 MAPREDUCE-434.4.patch, MAPREDUCE-434.5.patch, MAPREDUCE-434.6.patch, 
 MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, 
 MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch


 when mapred.job.tracker is set to 'local', my setNumReduceTasks call is 
 ignored, and the number of reduce tasks is set at 1.
 This prevents me from locally debugging my partition function, which tries to 
 partition based on the number of reduce tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5446) TestJobHistoryEvents and TestJobHistoryParsing have race conditions

2013-08-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730631#comment-13730631
 ] 

Hudson commented on MAPREDUCE-5446:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk #293 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/293/])
MAPREDUCE-5446. TestJobHistoryEvents and TestJobHistoryParsing have race 
conditions. Contributed by Jason Lowe. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510581)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java


 TestJobHistoryEvents and TestJobHistoryParsing have race conditions
 ---

 Key: MAPREDUCE-5446
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5446
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, test
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 3.0.0, 2.1.1-beta

 Attachments: MAPREDUCE-5446.patch


 TestJobHistoryEvents and TestJobHistoryParsing are not properly waiting for 
 MRApp to finish.  Currently they are polling the service state looking for 
 Service.STATE.STOPPED, but the service can appear to be in that state 
 *before* it is fully stopped.  This causes tests to finish with MRApp threads 
 still in-flight, and those threads can conflict with subsequent tests when 
 they collide in the filesystem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-434) LocalJobRunner limited to single reducer

2013-08-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730625#comment-13730625
 ] 

Hudson commented on MAPREDUCE-434:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #293 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/293/])
MAPREDUCE-434. LocalJobRunner limited to single reducer (Sandy Ryza and Aaron 
Kimball via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510866)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalJobRunner.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/ReduceTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/ShuffleConsumerPlugin.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/LocalFetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/TestShufflePlugin.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestJobCounters.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/lib/TestKeyFieldBasedComparator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLocalRunner.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestMRKeyFieldBasedComparator.java


 LocalJobRunner limited to single reducer
 

 Key: MAPREDUCE-434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
 Environment: local job tracker
Reporter: Yoram Arnon
Assignee: Aaron Kimball
Priority: Minor
 Fix For: 2.3.0

 Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch, 
 MAPREDUCE-434.4.patch, MAPREDUCE-434.5.patch, MAPREDUCE-434.6.patch, 
 MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, 
 MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch


 when mapred.job.tracker is set to 'local', my setNumReduceTasks call is 
 ignored, and the number of reduce tasks is set at 1.
 This prevents me from locally debugging my partition function, which tries to 
 partition based on the number of reduce tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5367) Local jobs all use same local working directory

2013-08-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730637#comment-13730637
 ] 

Hudson commented on MAPREDUCE-5367:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk #293 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/293/])
MAPREDUCE-5367. Local jobs all use same local working directory (Sandy Ryza) 
(sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510610)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalJobRunner.java


 Local jobs all use same local working directory
 ---

 Key: MAPREDUCE-5367
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5367
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 1.3.0, 2.1.1-beta

 Attachments: MAPREDUCE-5367-b1-1.patch, MAPREDUCE-5367-b1.patch, 
 MAPREDUCE-5367.patch


 This means that local jobs, even in different JVMs, can't run concurrently 
 because they might delete each other's files during work directory setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5399) Unnecessary Configuration instantiation in IFileInputStream slows down merge

2013-08-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730640#comment-13730640
 ] 

Hudson commented on MAPREDUCE-5399:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk #293 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/293/])
MAPREDUCE-5399. Unnecessary Configuration instantiation in IFileInputStream 
slows down merge. (Stanislav Barton via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510811)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java


 Unnecessary Configuration instantiation in IFileInputStream slows down merge
 

 Key: MAPREDUCE-5399
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5399
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 1.1.0, 2.0.2-alpha
Reporter: Stanislav Barton
Assignee: Stanislav Barton
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: MAPREDUCE-5399.patch


 We are using hadoop-2.0.0+1357-1.cdh4.3.0.p0.21 with MRv1. After upgrade from 
 4.1.2 to 4.3.0, I have noticed some performance deterioration in our MR job 
 in the Reduce phase. The MR job has usually 10 000 map tasks (10 000 files on 
 input each about 100MB) and 6 000 reducers (one reducer per table region). I 
 was trying to figure out what at which phase the slow down appears (firstly I 
 suspected that the slow gathering of the 1 map output files is the 
 culprit) and found out that the problem is not reading the map output (the 
 shuffle) but the sort/merge phase that follows - the last and actual reduce 
 phase is fast. I have tried to up the io.sort.factor because I thought the 
 lots of small files are being merged on disk, but again upping that to 1000 
 didnt do any difference. I have then printed the stack trace and found out 
 that the problem is initialization of the 
 org.apache.hadoop.mapred.IFileInputStream namely the creation of the 
 Configuration object which is not propagated along from earlier context, see 
 the stack trace:
 Thread 13332: (state = IN_NATIVE)
  - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 
 (Compiled frame; information may be imprecise)
  - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 
 (Compiled frame)
  - java.io.File.exists() @bci=20, line=733 (Compiled frame)
  - sun.misc.URLClassPath$FileLoader.getResource(java.lang.String, boolean) 
 @bci=136, line=999 (Compiled frame)
  - sun.misc.URLClassPath$FileLoader.findResource(java.lang.String, boolean) 
 @bci=3, line=966 (Compiled frame)
  - sun.misc.URLClassPath.findResource(java.lang.String, boolean) @bci=17, 
 line=146 (Compiled frame)
  - java.net.URLClassLoader$2.run() @bci=12, line=385 (Compiled frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedAction, 
 java.security.AccessControlContext) @bci=0 (Compiled frame)
  - java.net.URLClassLoader.findResource(java.lang.String) @bci=13, line=382 
 (Compiled frame)
  - java.lang.ClassLoader.getResource(java.lang.String) @bci=30, line=1002 
 (Compiled frame)
  - java.lang.ClassLoader.getResourceAsStream(java.lang.String) @bci=2, 
 line=1192 (Compiled frame)
  - javax.xml.parsers.SecuritySupport$4.run() @bci=26, line=96 (Compiled frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedAction) 
 @bci=0 (Compiled frame)
  - 
 javax.xml.parsers.SecuritySupport.getResourceAsStream(java.lang.ClassLoader, 
 java.lang.String) @bci=10, line=89 (Compiled frame)
  - javax.xml.parsers.FactoryFinder.findJarServiceProvider(java.lang.String) 
 @bci=38, line=250 (Interpreted frame)
  - javax.xml.parsers.FactoryFinder.find(java.lang.String, java.lang.String) 
 @bci=273, line=223 (Interpreted frame)
  - javax.xml.parsers.DocumentBuilderFactory.newInstance() @bci=4, line=123 
 (Compiled frame)
  - org.apache.hadoop.conf.Configuration.loadResource(java.util.Properties, 
 org.apache.hadoop.conf.Configuration$Resource, boolean) @bci=16, line=1890 
 (Compiled frame)
  - org.apache.hadoop.conf.Configuration.loadResources(java.util.Properties, 
 java.util.ArrayList, boolean) @bci=49, line=1867 (Compiled frame)
  - org.apache.hadoop.conf.Configuration.getProps() 

[jira] [Updated] (MAPREDUCE-5450) Unnecessary Configuration instantiation in IFileInputStream slows down merge - Port to branch-1

2013-08-06 Thread Stanislav Barton (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislav Barton updated MAPREDUCE-5450:


Attachment: MAPREDUCE-5450-1.1.0.txt

 Unnecessary Configuration instantiation in IFileInputStream slows down merge 
 - Port to branch-1
 ---

 Key: MAPREDUCE-5450
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5450
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.1.0
Reporter: Stanislav Barton
Assignee: Stanislav Barton
Priority: Blocker
 Fix For: 1.2.1

 Attachments: MAPREDUCE-5450-1.1.0.txt


 We are using hadoop-2.0.0+1357-1.cdh4.3.0.p0.21 with MRv1. After upgrade from 
 4.1.2 to 4.3.0, I have noticed some performance deterioration in our MR job 
 in the Reduce phase. The MR job has usually 10 000 map tasks (10 000 files on 
 input each about 100MB) and 6 000 reducers (one reducer per table region). I 
 was trying to figure out what at which phase the slow down appears (firstly I 
 suspected that the slow gathering of the 1 map output files is the 
 culprit) and found out that the problem is not reading the map output (the 
 shuffle) but the sort/merge phase that follows - the last and actual reduce 
 phase is fast. I have tried to up the io.sort.factor because I thought the 
 lots of small files are being merged on disk, but again upping that to 1000 
 didnt do any difference. I have then printed the stack trace and found out 
 that the problem is initialization of the 
 org.apache.hadoop.mapred.IFileInputStream namely the creation of the 
 Configuration object which is not propagated along from earlier context, see 
 the stack trace:
 Thread 13332: (state = IN_NATIVE)
  - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 
 (Compiled frame; information may be imprecise)
  - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 
 (Compiled frame)
  - java.io.File.exists() @bci=20, line=733 (Compiled frame)
  - sun.misc.URLClassPath$FileLoader.getResource(java.lang.String, boolean) 
 @bci=136, line=999 (Compiled frame)
  - sun.misc.URLClassPath$FileLoader.findResource(java.lang.String, boolean) 
 @bci=3, line=966 (Compiled frame)
  - sun.misc.URLClassPath.findResource(java.lang.String, boolean) @bci=17, 
 line=146 (Compiled frame)
  - java.net.URLClassLoader$2.run() @bci=12, line=385 (Compiled frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedAction, 
 java.security.AccessControlContext) @bci=0 (Compiled frame)
  - java.net.URLClassLoader.findResource(java.lang.String) @bci=13, line=382 
 (Compiled frame)
  - java.lang.ClassLoader.getResource(java.lang.String) @bci=30, line=1002 
 (Compiled frame)
  - java.lang.ClassLoader.getResourceAsStream(java.lang.String) @bci=2, 
 line=1192 (Compiled frame)
  - javax.xml.parsers.SecuritySupport$4.run() @bci=26, line=96 (Compiled frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedAction) 
 @bci=0 (Compiled frame)
  - 
 javax.xml.parsers.SecuritySupport.getResourceAsStream(java.lang.ClassLoader, 
 java.lang.String) @bci=10, line=89 (Compiled frame)
  - javax.xml.parsers.FactoryFinder.findJarServiceProvider(java.lang.String) 
 @bci=38, line=250 (Interpreted frame)
  - javax.xml.parsers.FactoryFinder.find(java.lang.String, java.lang.String) 
 @bci=273, line=223 (Interpreted frame)
  - javax.xml.parsers.DocumentBuilderFactory.newInstance() @bci=4, line=123 
 (Compiled frame)
  - org.apache.hadoop.conf.Configuration.loadResource(java.util.Properties, 
 org.apache.hadoop.conf.Configuration$Resource, boolean) @bci=16, line=1890 
 (Compiled frame)
  - org.apache.hadoop.conf.Configuration.loadResources(java.util.Properties, 
 java.util.ArrayList, boolean) @bci=49, line=1867 (Compiled frame)
  - org.apache.hadoop.conf.Configuration.getProps() @bci=43, line=1785 
 (Compiled frame)
  - org.apache.hadoop.conf.Configuration.get(java.lang.String) @bci=35, 
 line=712 (Compiled frame)
  - org.apache.hadoop.conf.Configuration.getTrimmed(java.lang.String) @bci=2, 
 line=731 (Compiled frame)
  - org.apache.hadoop.conf.Configuration.getBoolean(java.lang.String, boolean) 
 @bci=2, line=1047 (Interpreted frame)
  - org.apache.hadoop.mapred.IFileInputStream.init(java.io.InputStream, 
 long, org.apache.hadoop.conf.Configuration) @bci=111, line=93 (Interpreted 
 frame)
  - 
 org.apache.hadoop.mapred.IFile$Reader.init(org.apache.hadoop.conf.Configuration,
  org.apache.hadoop.fs.FSDataInputStream, long, 
 org.apache.hadoop.io.compress.CompressionCodec, 
 org.apache.hadoop.mapred.Counters$Counter) @bci=60, line=303 (Interpreted 
 frame)
  - 
 

[jira] [Updated] (MAPREDUCE-5432) JobHistoryParser does not fetch clockSplits, cpuUsages, vMemKbytes, physMemKbytes from history file

2013-08-06 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-5432:
--

Attachment: MAPREDUCE-5432.1.patch

Updated JobHistoryServer to fetch these attributes from events.

 JobHistoryParser does not fetch clockSplits, cpuUsages, vMemKbytes, 
 physMemKbytes from history file
 ---

 Key: MAPREDUCE-5432
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5432
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 2.0.4-alpha
Reporter: Vrushali C
 Attachments: MAPREDUCE-5432.1.patch


 JobHistoryParser's handleMapAttemptFinishedEvent() function does not look at 
 MapAttemptFinishedEvent's   
   int[] clockSplits;
   int[] cpuUsages;
   int[] vMemKbytes;
   int[] physMemKbytes;
 JobHistoryParser's inner class TaskAttemptInfo also needs to be enhanced to 
 have these as members so that handleMapAttemptFinishedEvent() can get them 
 and store them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5432) JobHistoryParser does not fetch clockSplits, cpuUsages, vMemKbytes, physMemKbytes from history file

2013-08-06 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-5432:
--

Assignee: Tsuyoshi OZAWA
  Status: Patch Available  (was: Open)

 JobHistoryParser does not fetch clockSplits, cpuUsages, vMemKbytes, 
 physMemKbytes from history file
 ---

 Key: MAPREDUCE-5432
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5432
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 2.0.4-alpha
Reporter: Vrushali C
Assignee: Tsuyoshi OZAWA
 Attachments: MAPREDUCE-5432.1.patch


 JobHistoryParser's handleMapAttemptFinishedEvent() function does not look at 
 MapAttemptFinishedEvent's   
   int[] clockSplits;
   int[] cpuUsages;
   int[] vMemKbytes;
   int[] physMemKbytes;
 JobHistoryParser's inner class TaskAttemptInfo also needs to be enhanced to 
 have these as members so that handleMapAttemptFinishedEvent() can get them 
 and store them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5367) Local jobs all use same local working directory

2013-08-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730740#comment-13730740
 ] 

Hudson commented on MAPREDUCE-5367:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #1483 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1483/])
MAPREDUCE-5367. Local jobs all use same local working directory (Sandy Ryza) 
(sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510610)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalJobRunner.java


 Local jobs all use same local working directory
 ---

 Key: MAPREDUCE-5367
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5367
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 1.3.0, 2.1.1-beta

 Attachments: MAPREDUCE-5367-b1-1.patch, MAPREDUCE-5367-b1.patch, 
 MAPREDUCE-5367.patch


 This means that local jobs, even in different JVMs, can't run concurrently 
 because they might delete each other's files during work directory setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-434) LocalJobRunner limited to single reducer

2013-08-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730728#comment-13730728
 ] 

Hudson commented on MAPREDUCE-434:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1483 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1483/])
MAPREDUCE-434. LocalJobRunner limited to single reducer (Sandy Ryza and Aaron 
Kimball via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510866)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalJobRunner.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/ReduceTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/ShuffleConsumerPlugin.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/LocalFetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/TestShufflePlugin.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestJobCounters.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/lib/TestKeyFieldBasedComparator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLocalRunner.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestMRKeyFieldBasedComparator.java


 LocalJobRunner limited to single reducer
 

 Key: MAPREDUCE-434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
 Environment: local job tracker
Reporter: Yoram Arnon
Assignee: Aaron Kimball
Priority: Minor
 Fix For: 2.3.0

 Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch, 
 MAPREDUCE-434.4.patch, MAPREDUCE-434.5.patch, MAPREDUCE-434.6.patch, 
 MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, 
 MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch


 when mapred.job.tracker is set to 'local', my setNumReduceTasks call is 
 ignored, and the number of reduce tasks is set at 1.
 This prevents me from locally debugging my partition function, which tries to 
 partition based on the number of reduce tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5446) TestJobHistoryEvents and TestJobHistoryParsing have race conditions

2013-08-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730734#comment-13730734
 ] 

Hudson commented on MAPREDUCE-5446:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #1483 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1483/])
MAPREDUCE-5446. TestJobHistoryEvents and TestJobHistoryParsing have race 
conditions. Contributed by Jason Lowe. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510581)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java


 TestJobHistoryEvents and TestJobHistoryParsing have race conditions
 ---

 Key: MAPREDUCE-5446
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5446
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, test
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 3.0.0, 2.1.1-beta

 Attachments: MAPREDUCE-5446.patch


 TestJobHistoryEvents and TestJobHistoryParsing are not properly waiting for 
 MRApp to finish.  Currently they are polling the service state looking for 
 Service.STATE.STOPPED, but the service can appear to be in that state 
 *before* it is fully stopped.  This causes tests to finish with MRApp threads 
 still in-flight, and those threads can conflict with subsequent tests when 
 they collide in the filesystem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5399) Unnecessary Configuration instantiation in IFileInputStream slows down merge

2013-08-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730743#comment-13730743
 ] 

Hudson commented on MAPREDUCE-5399:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #1483 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1483/])
MAPREDUCE-5399. Unnecessary Configuration instantiation in IFileInputStream 
slows down merge. (Stanislav Barton via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510811)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java


 Unnecessary Configuration instantiation in IFileInputStream slows down merge
 

 Key: MAPREDUCE-5399
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5399
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 1.1.0, 2.0.2-alpha
Reporter: Stanislav Barton
Assignee: Stanislav Barton
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: MAPREDUCE-5399.patch


 We are using hadoop-2.0.0+1357-1.cdh4.3.0.p0.21 with MRv1. After upgrade from 
 4.1.2 to 4.3.0, I have noticed some performance deterioration in our MR job 
 in the Reduce phase. The MR job has usually 10 000 map tasks (10 000 files on 
 input each about 100MB) and 6 000 reducers (one reducer per table region). I 
 was trying to figure out what at which phase the slow down appears (firstly I 
 suspected that the slow gathering of the 1 map output files is the 
 culprit) and found out that the problem is not reading the map output (the 
 shuffle) but the sort/merge phase that follows - the last and actual reduce 
 phase is fast. I have tried to up the io.sort.factor because I thought the 
 lots of small files are being merged on disk, but again upping that to 1000 
 didnt do any difference. I have then printed the stack trace and found out 
 that the problem is initialization of the 
 org.apache.hadoop.mapred.IFileInputStream namely the creation of the 
 Configuration object which is not propagated along from earlier context, see 
 the stack trace:
 Thread 13332: (state = IN_NATIVE)
  - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 
 (Compiled frame; information may be imprecise)
  - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 
 (Compiled frame)
  - java.io.File.exists() @bci=20, line=733 (Compiled frame)
  - sun.misc.URLClassPath$FileLoader.getResource(java.lang.String, boolean) 
 @bci=136, line=999 (Compiled frame)
  - sun.misc.URLClassPath$FileLoader.findResource(java.lang.String, boolean) 
 @bci=3, line=966 (Compiled frame)
  - sun.misc.URLClassPath.findResource(java.lang.String, boolean) @bci=17, 
 line=146 (Compiled frame)
  - java.net.URLClassLoader$2.run() @bci=12, line=385 (Compiled frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedAction, 
 java.security.AccessControlContext) @bci=0 (Compiled frame)
  - java.net.URLClassLoader.findResource(java.lang.String) @bci=13, line=382 
 (Compiled frame)
  - java.lang.ClassLoader.getResource(java.lang.String) @bci=30, line=1002 
 (Compiled frame)
  - java.lang.ClassLoader.getResourceAsStream(java.lang.String) @bci=2, 
 line=1192 (Compiled frame)
  - javax.xml.parsers.SecuritySupport$4.run() @bci=26, line=96 (Compiled frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedAction) 
 @bci=0 (Compiled frame)
  - 
 javax.xml.parsers.SecuritySupport.getResourceAsStream(java.lang.ClassLoader, 
 java.lang.String) @bci=10, line=89 (Compiled frame)
  - javax.xml.parsers.FactoryFinder.findJarServiceProvider(java.lang.String) 
 @bci=38, line=250 (Interpreted frame)
  - javax.xml.parsers.FactoryFinder.find(java.lang.String, java.lang.String) 
 @bci=273, line=223 (Interpreted frame)
  - javax.xml.parsers.DocumentBuilderFactory.newInstance() @bci=4, line=123 
 (Compiled frame)
  - org.apache.hadoop.conf.Configuration.loadResource(java.util.Properties, 
 org.apache.hadoop.conf.Configuration$Resource, boolean) @bci=16, line=1890 
 (Compiled frame)
  - org.apache.hadoop.conf.Configuration.loadResources(java.util.Properties, 
 java.util.ArrayList, boolean) @bci=49, line=1867 (Compiled frame)
  - org.apache.hadoop.conf.Configuration.getProps() 

[jira] [Commented] (MAPREDUCE-5432) JobHistoryParser does not fetch clockSplits, cpuUsages, vMemKbytes, physMemKbytes from history file

2013-08-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730750#comment-13730750
 ] 

Hadoop QA commented on MAPREDUCE-5432:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12596342/MAPREDUCE-5432.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3933//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3933//console

This message is automatically generated.

 JobHistoryParser does not fetch clockSplits, cpuUsages, vMemKbytes, 
 physMemKbytes from history file
 ---

 Key: MAPREDUCE-5432
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5432
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 2.0.4-alpha
Reporter: Vrushali C
Assignee: Tsuyoshi OZAWA
 Attachments: MAPREDUCE-5432.1.patch


 JobHistoryParser's handleMapAttemptFinishedEvent() function does not look at 
 MapAttemptFinishedEvent's   
   int[] clockSplits;
   int[] cpuUsages;
   int[] vMemKbytes;
   int[] physMemKbytes;
 JobHistoryParser's inner class TaskAttemptInfo also needs to be enhanced to 
 have these as members so that handleMapAttemptFinishedEvent() can get them 
 and store them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-434) LocalJobRunner limited to single reducer

2013-08-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730797#comment-13730797
 ] 

Hudson commented on MAPREDUCE-434:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1510 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1510/])
MAPREDUCE-434. LocalJobRunner limited to single reducer (Sandy Ryza and Aaron 
Kimball via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510866)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalJobRunner.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/ReduceTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/ShuffleConsumerPlugin.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/LocalFetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/TestShufflePlugin.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestJobCounters.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/lib/TestKeyFieldBasedComparator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLocalRunner.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/partition/TestMRKeyFieldBasedComparator.java


 LocalJobRunner limited to single reducer
 

 Key: MAPREDUCE-434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
 Environment: local job tracker
Reporter: Yoram Arnon
Assignee: Aaron Kimball
Priority: Minor
 Fix For: 2.3.0

 Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch, 
 MAPREDUCE-434.4.patch, MAPREDUCE-434.5.patch, MAPREDUCE-434.6.patch, 
 MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch, 
 MAPREDUCE-434.patch, MAPREDUCE-434.patch, MAPREDUCE-434.patch


 when mapred.job.tracker is set to 'local', my setNumReduceTasks call is 
 ignored, and the number of reduce tasks is set at 1.
 This prevents me from locally debugging my partition function, which tries to 
 partition based on the number of reduce tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5446) TestJobHistoryEvents and TestJobHistoryParsing have race conditions

2013-08-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730803#comment-13730803
 ] 

Hudson commented on MAPREDUCE-5446:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1510 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1510/])
MAPREDUCE-5446. TestJobHistoryEvents and TestJobHistoryParsing have race 
conditions. Contributed by Jason Lowe. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510581)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java


 TestJobHistoryEvents and TestJobHistoryParsing have race conditions
 ---

 Key: MAPREDUCE-5446
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5446
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, test
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 3.0.0, 2.1.1-beta

 Attachments: MAPREDUCE-5446.patch


 TestJobHistoryEvents and TestJobHistoryParsing are not properly waiting for 
 MRApp to finish.  Currently they are polling the service state looking for 
 Service.STATE.STOPPED, but the service can appear to be in that state 
 *before* it is fully stopped.  This causes tests to finish with MRApp threads 
 still in-flight, and those threads can conflict with subsequent tests when 
 they collide in the filesystem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5367) Local jobs all use same local working directory

2013-08-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730809#comment-13730809
 ] 

Hudson commented on MAPREDUCE-5367:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1510 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1510/])
MAPREDUCE-5367. Local jobs all use same local working directory (Sandy Ryza) 
(sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510610)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalJobRunner.java


 Local jobs all use same local working directory
 ---

 Key: MAPREDUCE-5367
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5367
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 1.3.0, 2.1.1-beta

 Attachments: MAPREDUCE-5367-b1-1.patch, MAPREDUCE-5367-b1.patch, 
 MAPREDUCE-5367.patch


 This means that local jobs, even in different JVMs, can't run concurrently 
 because they might delete each other's files during work directory setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5399) Unnecessary Configuration instantiation in IFileInputStream slows down merge

2013-08-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730812#comment-13730812
 ] 

Hudson commented on MAPREDUCE-5399:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1510 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1510/])
MAPREDUCE-5399. Unnecessary Configuration instantiation in IFileInputStream 
slows down merge. (Stanislav Barton via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1510811)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/BackupStore.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java


 Unnecessary Configuration instantiation in IFileInputStream slows down merge
 

 Key: MAPREDUCE-5399
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5399
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 1.1.0, 2.0.2-alpha
Reporter: Stanislav Barton
Assignee: Stanislav Barton
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: MAPREDUCE-5399.patch


 We are using hadoop-2.0.0+1357-1.cdh4.3.0.p0.21 with MRv1. After upgrade from 
 4.1.2 to 4.3.0, I have noticed some performance deterioration in our MR job 
 in the Reduce phase. The MR job has usually 10 000 map tasks (10 000 files on 
 input each about 100MB) and 6 000 reducers (one reducer per table region). I 
 was trying to figure out what at which phase the slow down appears (firstly I 
 suspected that the slow gathering of the 1 map output files is the 
 culprit) and found out that the problem is not reading the map output (the 
 shuffle) but the sort/merge phase that follows - the last and actual reduce 
 phase is fast. I have tried to up the io.sort.factor because I thought the 
 lots of small files are being merged on disk, but again upping that to 1000 
 didnt do any difference. I have then printed the stack trace and found out 
 that the problem is initialization of the 
 org.apache.hadoop.mapred.IFileInputStream namely the creation of the 
 Configuration object which is not propagated along from earlier context, see 
 the stack trace:
 Thread 13332: (state = IN_NATIVE)
  - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 
 (Compiled frame; information may be imprecise)
  - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 
 (Compiled frame)
  - java.io.File.exists() @bci=20, line=733 (Compiled frame)
  - sun.misc.URLClassPath$FileLoader.getResource(java.lang.String, boolean) 
 @bci=136, line=999 (Compiled frame)
  - sun.misc.URLClassPath$FileLoader.findResource(java.lang.String, boolean) 
 @bci=3, line=966 (Compiled frame)
  - sun.misc.URLClassPath.findResource(java.lang.String, boolean) @bci=17, 
 line=146 (Compiled frame)
  - java.net.URLClassLoader$2.run() @bci=12, line=385 (Compiled frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedAction, 
 java.security.AccessControlContext) @bci=0 (Compiled frame)
  - java.net.URLClassLoader.findResource(java.lang.String) @bci=13, line=382 
 (Compiled frame)
  - java.lang.ClassLoader.getResource(java.lang.String) @bci=30, line=1002 
 (Compiled frame)
  - java.lang.ClassLoader.getResourceAsStream(java.lang.String) @bci=2, 
 line=1192 (Compiled frame)
  - javax.xml.parsers.SecuritySupport$4.run() @bci=26, line=96 (Compiled frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedAction) 
 @bci=0 (Compiled frame)
  - 
 javax.xml.parsers.SecuritySupport.getResourceAsStream(java.lang.ClassLoader, 
 java.lang.String) @bci=10, line=89 (Compiled frame)
  - javax.xml.parsers.FactoryFinder.findJarServiceProvider(java.lang.String) 
 @bci=38, line=250 (Interpreted frame)
  - javax.xml.parsers.FactoryFinder.find(java.lang.String, java.lang.String) 
 @bci=273, line=223 (Interpreted frame)
  - javax.xml.parsers.DocumentBuilderFactory.newInstance() @bci=4, line=123 
 (Compiled frame)
  - org.apache.hadoop.conf.Configuration.loadResource(java.util.Properties, 
 org.apache.hadoop.conf.Configuration$Resource, boolean) @bci=16, line=1890 
 (Compiled frame)
  - org.apache.hadoop.conf.Configuration.loadResources(java.util.Properties, 
 java.util.ArrayList, boolean) @bci=49, line=1867 (Compiled frame)
  - 

[jira] [Commented] (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2013-08-06 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731036#comment-13731036
 ] 

Mariappan Asokan commented on MAPREDUCE-1176:
-

I was looking for an implementation of this record format as well.  I agree 
with the following comment by Todd:
{quote}
As a general note, I'm not sure I agree with the design here. Rather than 
forcing the split to lie on record boundaries, I think it would be simpler to 
simply let FileInputFormat compute its own splits, and then when you first open 
the record reader, skip forward to the next record boundary and begin reading 
from there. Then for the last record of the file, over read your split into 
the beginning of the next one. This is the strategy that other input formats 
take, and should be compatible with the splittable compression codecs (see 
TextInputFormat for example).
{quote}
I think we should support fixed length records spanning across HDFS blocks.

BitsOfInfo, do you mind if I pick up your patch, enhance it to take care of the 
above case, and post a patch for the trunk?

I would appreciate if a committer can come forward to review the patch and 
commit it to the trunk.

Thanks.

-- Asokan

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS

2013-08-06 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5311:
--

Summary: Replace SLOTS_MILLIS counters with MEM_MILLIS  (was: Remove slot 
millis computation logic and deprecate counter constants)

 Replace SLOTS_MILLIS counters with MEM_MILLIS
 -

 Key: MAPREDUCE-5311
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, 
 MAPREDUCE-5311.patch


 Per discussion in MAPREDUCE-5310 and comments in the code we should remove 
 all the related logic and just leave the counter constant for backwards 
 compatibility and deprecate the counter constants.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS

2013-08-06 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731067#comment-13731067
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5311:
---

[~acmurthy], unless I got things wrong, we agreed to keep slot-millis around 
until we have memory-millis. And the latest patch here is doing that. 

 Replace SLOTS_MILLIS counters with MEM_MILLIS
 -

 Key: MAPREDUCE-5311
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, 
 MAPREDUCE-5311.patch


 Per discussion in MAPREDUCE-5310 and comments in the code we should remove 
 all the related logic and just leave the counter constant for backwards 
 compatibility and deprecate the counter constants.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5432) JobHistoryParser does not fetch clockSplits, cpuUsages, vMemKbytes, physMemKbytes from history file

2013-08-06 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731078#comment-13731078
 ] 

Vrushali C commented on MAPREDUCE-5432:
---

A few minor comments:

+  printAttributes(CLOCK_SPLIT, clockSplits);
+  printAttributes(CPU_USAGE, cpuUsages);
Maybe print it as CLOCK_SPLITS AND CPU_USAGES so that it is consistent with 
the naming of the attribute?


for the following function: 
+void printAttributes(String tag, int[] attributes) {
+  for (int i=0;iattributes.length;i++) {
+System.out.println(tag + [ + i + ]: + attributes[i]);
+ }
+}

I believe it would print it as: 
CLOCK_SPLIT[0]: 8691
CLOCK_SPLIT[1]: 129
CLOCK_SPLIT[2]: 128
CLOCK_SPLIT[3]: 129
CLOCK_SPLIT[4]: 128
... etc

Would it be slightly better to print it as: 
clockSplits:[8691,129,128,129,128,129,128,129,128,129,128,129]
cpuUsages:[28,28,29,28,28,29,28,28,29,28,28,29]




 JobHistoryParser does not fetch clockSplits, cpuUsages, vMemKbytes, 
 physMemKbytes from history file
 ---

 Key: MAPREDUCE-5432
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5432
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 2.0.4-alpha
Reporter: Vrushali C
Assignee: Tsuyoshi OZAWA
 Attachments: MAPREDUCE-5432.1.patch


 JobHistoryParser's handleMapAttemptFinishedEvent() function does not look at 
 MapAttemptFinishedEvent's   
   int[] clockSplits;
   int[] cpuUsages;
   int[] vMemKbytes;
   int[] physMemKbytes;
 JobHistoryParser's inner class TaskAttemptInfo also needs to be enhanced to 
 have these as members so that handleMapAttemptFinishedEvent() can get them 
 and store them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2013-08-06 Thread Debashis Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731076#comment-13731076
 ] 

Debashis Saha commented on MAPREDUCE-1176:
--

The reason other input format takes that approach is they don't have any
other way to figure out exact boundary. With fixed format you can exactly
know the boundary and in my opinion you should take advantage of it.






-- 
- Deba
--~O~--


 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2013-08-06 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731089#comment-13731089
 ] 

Mariappan Asokan commented on MAPREDUCE-1176:
-

Hi Debashis,
  You are correct.  It is easy to identify records spanning across HDFS blocks.

-- Asokan

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2013-08-06 Thread BitsOfInfo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731149#comment-13731149
 ] 

BitsOfInfo commented on MAPREDUCE-1176:
---

Asokan: Sure go ahead make whatever changes are necessary; as I have no time to 
work on this anymore; yet would like to see this put into the project as I had 
a use for it when I created it and I'm sure others do as well.

BTW: Never had my original question answered from a few years ago in regards to 
the design, maybe I'm was missing something.

bq. Hmm, ok, do you have suggestion on how I detect where one record begins 
and one record ends when records are not identifiable by any sort of consistent 
start character or end character boundary but just flow together? I could 
see the RecordReader detecting that it only read  RECORD LENGTH bytes and 
hitting the end of the split and discarding it. But I am not sure how it would 
detect the start of a record, with a split that has partial data at the start 
of it. Especially if there is no consistent boundary/char marker that 
identifies the start of a record.



 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5439) mapred-default.xml has missing properties

2013-08-06 Thread Jaimin D Jetly (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaimin D Jetly updated MAPREDUCE-5439:
--

Summary: mapred-default.xml has missing properties  (was: 
mpared-default.xml has missing properties)

 mapred-default.xml has missing properties
 -

 Key: MAPREDUCE-5439
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5439
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.1.0-beta
Reporter: Siddharth Wagle
 Fix For: 2.1.0-beta


 Properties that need to be added:
 mapreduce.map.memory.mb
 mapreduce.map.java.opts
 mapreduce.reduce.memory.mb
 mapreduce.reduce.java.opts
 Properties that need to be fixed:
 mapred.child.java.opts should not be in mapred-default.
 yarn.app.mapreduce.am.command-opts description needs fixing

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5449) Jobtracker WebUI Add acl to control access to views of submitted jobs

2013-08-06 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731219#comment-13731219
 ] 

Chris Nauroth commented on MAPREDUCE-5449:
--

Hi, Christopher.  Is this a duplicate of MAPREDUCE-5109?

 Jobtracker WebUI Add acl to control access to views of submitted jobs
 -

 Key: MAPREDUCE-5449
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5449
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Christopher LaPlante

 Jobtracker WebUI currently displays the job name for every job submitted. 
 Would like the ability to apply ACL's so that users/groups can only see the 
 job's they have submitted and not all the jobs submitted to the cluster. Hive 
 queries put the query as the job name in job tracker. This query information 
 can contain sensitive information, currently the only way to limit access is 
 to limit access to the job tracker ui which reduces the job owners ability to 
 troubleshoot issues

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5450) Unnecessary Configuration instantiation in IFileInputStream slows down merge - Port to branch-1

2013-08-06 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731235#comment-13731235
 ] 

Sandy Ryza commented on MAPREDUCE-5450:
---

The attached patch is failing to compile with the following error:
{code}

[javac] 
/home/sandy/svn-apache/branch-1/src/mapred/org/apache/hadoop/mapred/ReduceTask.java:2445:
 non-static variable conf cannot be referenced from a static context
[javac]  mo.data, 0, mo.data.length, 
conf);
{code}

 Unnecessary Configuration instantiation in IFileInputStream slows down merge 
 - Port to branch-1
 ---

 Key: MAPREDUCE-5450
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5450
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.1.0
Reporter: Stanislav Barton
Assignee: Stanislav Barton
Priority: Blocker
 Fix For: 1.2.1

 Attachments: MAPREDUCE-5450-1.1.0.txt


 We are using hadoop-2.0.0+1357-1.cdh4.3.0.p0.21 with MRv1. After upgrade from 
 4.1.2 to 4.3.0, I have noticed some performance deterioration in our MR job 
 in the Reduce phase. The MR job has usually 10 000 map tasks (10 000 files on 
 input each about 100MB) and 6 000 reducers (one reducer per table region). I 
 was trying to figure out what at which phase the slow down appears (firstly I 
 suspected that the slow gathering of the 1 map output files is the 
 culprit) and found out that the problem is not reading the map output (the 
 shuffle) but the sort/merge phase that follows - the last and actual reduce 
 phase is fast. I have tried to up the io.sort.factor because I thought the 
 lots of small files are being merged on disk, but again upping that to 1000 
 didnt do any difference. I have then printed the stack trace and found out 
 that the problem is initialization of the 
 org.apache.hadoop.mapred.IFileInputStream namely the creation of the 
 Configuration object which is not propagated along from earlier context, see 
 the stack trace:
 Thread 13332: (state = IN_NATIVE)
  - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 
 (Compiled frame; information may be imprecise)
  - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 
 (Compiled frame)
  - java.io.File.exists() @bci=20, line=733 (Compiled frame)
  - sun.misc.URLClassPath$FileLoader.getResource(java.lang.String, boolean) 
 @bci=136, line=999 (Compiled frame)
  - sun.misc.URLClassPath$FileLoader.findResource(java.lang.String, boolean) 
 @bci=3, line=966 (Compiled frame)
  - sun.misc.URLClassPath.findResource(java.lang.String, boolean) @bci=17, 
 line=146 (Compiled frame)
  - java.net.URLClassLoader$2.run() @bci=12, line=385 (Compiled frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedAction, 
 java.security.AccessControlContext) @bci=0 (Compiled frame)
  - java.net.URLClassLoader.findResource(java.lang.String) @bci=13, line=382 
 (Compiled frame)
  - java.lang.ClassLoader.getResource(java.lang.String) @bci=30, line=1002 
 (Compiled frame)
  - java.lang.ClassLoader.getResourceAsStream(java.lang.String) @bci=2, 
 line=1192 (Compiled frame)
  - javax.xml.parsers.SecuritySupport$4.run() @bci=26, line=96 (Compiled frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedAction) 
 @bci=0 (Compiled frame)
  - 
 javax.xml.parsers.SecuritySupport.getResourceAsStream(java.lang.ClassLoader, 
 java.lang.String) @bci=10, line=89 (Compiled frame)
  - javax.xml.parsers.FactoryFinder.findJarServiceProvider(java.lang.String) 
 @bci=38, line=250 (Interpreted frame)
  - javax.xml.parsers.FactoryFinder.find(java.lang.String, java.lang.String) 
 @bci=273, line=223 (Interpreted frame)
  - javax.xml.parsers.DocumentBuilderFactory.newInstance() @bci=4, line=123 
 (Compiled frame)
  - org.apache.hadoop.conf.Configuration.loadResource(java.util.Properties, 
 org.apache.hadoop.conf.Configuration$Resource, boolean) @bci=16, line=1890 
 (Compiled frame)
  - org.apache.hadoop.conf.Configuration.loadResources(java.util.Properties, 
 java.util.ArrayList, boolean) @bci=49, line=1867 (Compiled frame)
  - org.apache.hadoop.conf.Configuration.getProps() @bci=43, line=1785 
 (Compiled frame)
  - org.apache.hadoop.conf.Configuration.get(java.lang.String) @bci=35, 
 line=712 (Compiled frame)
  - org.apache.hadoop.conf.Configuration.getTrimmed(java.lang.String) @bci=2, 
 line=731 (Compiled frame)
  - org.apache.hadoop.conf.Configuration.getBoolean(java.lang.String, boolean) 
 @bci=2, line=1047 (Interpreted frame)
  - org.apache.hadoop.mapred.IFileInputStream.init(java.io.InputStream, 
 long, org.apache.hadoop.conf.Configuration) @bci=111, line=93 (Interpreted 
 frame)
  - 

[jira] [Commented] (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2013-08-06 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731239#comment-13731239
 ] 

Mariappan Asokan commented on MAPREDUCE-1176:
-

BitsOfInfo,
   For each split, you need to compute how many bytes to skip(to account 
partial record that spans across previous and current splits.)  Let us say we 
are processing split N(where N is a 0-based number) in the record reader, Z is 
the cumulative total of split sizes for splits from 0 thru N-1, L is the record 
length, and S is the number of bytes to skip at the beginning of split N.  When 
N = 0, S = 0 and for all other N, S = L - (Z mod L)

The record reader should account the last record in a split by reading 
additional bytes from next split if necessary.

Hope I clarified the logic.

-- Asokan

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS

2013-08-06 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-5311:
--

 Priority: Blocker  (was: Major)
Fix Version/s: 2.1.0-beta

We need to take care of this for 2.1.0, making it a blocker.

 Replace SLOTS_MILLIS counters with MEM_MILLIS
 -

 Key: MAPREDUCE-5311
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Sandy Ryza
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, 
 MAPREDUCE-5311.patch


 Per discussion in MAPREDUCE-5310 and comments in the code we should remove 
 all the related logic and just leave the counter constant for backwards 
 compatibility and deprecate the counter constants.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5384) Races in DelegationTokenRenewal

2013-08-06 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated MAPREDUCE-5384:


Attachment: mr-5384-2.patch

[~sseth], very good point. In my testing, I noticed the renewal RPC takes about 
5 ms and could take longer.

Uploading a patch that addresses this issue
# Call token.renew() outside of a synchronized block
# To address the potential race with cancel(), cancel() now returns a boolean - 
true if it successfully cancels all renewals and false if there is a renewal 
currently in progress. Renewing once after the cancel() is called is benign, 
but the user can be intimated about this renewal in progress.
# TestDelegationTokenRenewal uses this intimation to allow for an extra renewal.

Testing: Ran the new TestDelegationTokenRenewal in a loop 10 times and it 
passed.

 Races in DelegationTokenRenewal
 ---

 Key: MAPREDUCE-5384
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5384
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.2.0, 1.1.2, 1.2.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: mr-5384-0.patch, mr-5384-1.patch, mr-5384-2.patch


 There are a couple of races in DelegationTokenRenewal. 
 One of them was addressed by MAPREDUCE-4860, which introduced a deadlock 
 while fixing this race. Opening a new JIRA per discussion in MAPREDUCE-5364, 
 since MAPREDUCE-4860 is already shipped in a release.
 Races to fix:
 # TimerTask#cancel() disallows future invocations of run(), but doesn't abort 
 an already scheduled/started run().
 # In the context of DelegationTokenRenewal, RenewalTimerTask#cancel() only 
 cancels that TimerTask instance. However, it has no effect on any other 
 TimerTasks created for that token. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5384) Races in DelegationTokenRenewal

2013-08-06 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated MAPREDUCE-5384:


Status: Patch Available  (was: Open)

 Races in DelegationTokenRenewal
 ---

 Key: MAPREDUCE-5384
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5384
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.2.1, 1.1.2, 1.2.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: mr-5384-0.patch, mr-5384-1.patch, mr-5384-2.patch


 There are a couple of races in DelegationTokenRenewal. 
 One of them was addressed by MAPREDUCE-4860, which introduced a deadlock 
 while fixing this race. Opening a new JIRA per discussion in MAPREDUCE-5364, 
 since MAPREDUCE-4860 is already shipped in a release.
 Races to fix:
 # TimerTask#cancel() disallows future invocations of run(), but doesn't abort 
 an already scheduled/started run().
 # In the context of DelegationTokenRenewal, RenewalTimerTask#cancel() only 
 cancels that TimerTask instance. However, it has no effect on any other 
 TimerTasks created for that token. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5384) Races in DelegationTokenRenewal

2013-08-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731268#comment-13731268
 ] 

Hadoop QA commented on MAPREDUCE-5384:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12596421/mr-5384-2.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3934//console

This message is automatically generated.

 Races in DelegationTokenRenewal
 ---

 Key: MAPREDUCE-5384
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5384
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.2.0, 1.1.2, 1.2.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: mr-5384-0.patch, mr-5384-1.patch, mr-5384-2.patch


 There are a couple of races in DelegationTokenRenewal. 
 One of them was addressed by MAPREDUCE-4860, which introduced a deadlock 
 while fixing this race. Opening a new JIRA per discussion in MAPREDUCE-5364, 
 since MAPREDUCE-4860 is already shipped in a release.
 Races to fix:
 # TimerTask#cancel() disallows future invocations of run(), but doesn't abort 
 an already scheduled/started run().
 # In the context of DelegationTokenRenewal, RenewalTimerTask#cancel() only 
 cancels that TimerTask instance. However, it has no effect on any other 
 TimerTasks created for that token. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS

2013-08-06 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731444#comment-13731444
 ] 

Arun C Murthy commented on MAPREDUCE-5311:
--

This will break every single MR app. I'm -1 on this.

 Replace SLOTS_MILLIS counters with MEM_MILLIS
 -

 Key: MAPREDUCE-5311
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Sandy Ryza
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, 
 MAPREDUCE-5311.patch


 Per discussion in MAPREDUCE-5310 and comments in the code we should remove 
 all the related logic and just leave the counter constant for backwards 
 compatibility and deprecate the counter constants.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS

2013-08-06 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731445#comment-13731445
 ] 

Arun C Murthy commented on MAPREDUCE-5311:
--

To Sandy's point - several folks do use mapred.cluster.(map,reduce).memory.mb, 
so it will be a big BC break for them.

Also, what does this mean for BC after we spent all the time fixing 
MAPREDUCE-5108.

 Replace SLOTS_MILLIS counters with MEM_MILLIS
 -

 Key: MAPREDUCE-5311
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Sandy Ryza
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, 
 MAPREDUCE-5311.patch


 Per discussion in MAPREDUCE-5310 and comments in the code we should remove 
 all the related logic and just leave the counter constant for backwards 
 compatibility and deprecate the counter constants.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5311) Replace SLOTS_MILLIS counters with MEM_MILLIS

2013-08-06 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731466#comment-13731466
 ] 

Sandy Ryza commented on MAPREDUCE-5311:
---

I am missing how the change would break every single MR app - we would leave 
the API intact.  However, does this sound like a reasonable compromise:  We 
keep SLOTS_MILLIS_MAPS, but instead of trying to map resources to slots (which 
will make even less sense when we introduce other resources), we compute it as 
the sum of the execution times for all map tasks.  Essentially 
SLOTS_MILLIS_MAPS functions as CONTAINERS_MILLIS_MAPS would and is computed 
independently from container capabilities.  This will break semantic 
compatibility in some cases for people using the 
mapred.cluster.(map,reduce).memory.mb properties, but it will restore semantic 
compatibility for people not using those properties (which I believe is a much 
larger number). It will also make the counter function in a more understandable 
way that is more in line with its description.


 Replace SLOTS_MILLIS counters with MEM_MILLIS
 -

 Key: MAPREDUCE-5311
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5311
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Sandy Ryza
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: MAPREDUCE-5311-1.patch, MAPREDUCE-5311.patch, 
 MAPREDUCE-5311.patch


 Per discussion in MAPREDUCE-5310 and comments in the code we should remove 
 all the related logic and just leave the counter constant for backwards 
 compatibility and deprecate the counter constants.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4661) Add HTTPS for WebUIs on Branch-1

2013-08-06 Thread Michael Weng (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Weng updated MAPREDUCE-4661:


Attachment: branch-1.2-patch.txt7

Found an error in JobHistory.java that breaks TestJobHistory and 
TestLostTracker. New patch is attached. Changes compared with previous patch:
---
-Keys.TRACKER_NAME, Keys.HTTP_PORT,
+Keys.TRACKER_NAME, Keys.HTTP_PORT, Keys.SHUFFLE_PORT,

-   * @return the taskLogsUrl. null if shuffle-port or tracker-name or
+   * @return the taskLogsUrl. null if http-port or tracker-name or
---

TestFileCreation and TestNNThroughputBenchmark are passed on individual 
testcase run after cleaning up.

The following two testcases are failed with and without the changes in the 
patch.

[junit] Test org.apache.hadoop.io.compress.TestCodec FAILED
[junit] Test org.apache.hadoop.fs.TestFsShellReturnCode FAILED

 Add HTTPS for WebUIs on Branch-1
 

 Key: MAPREDUCE-4661
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4661
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: security, webapps
Affects Versions: 1.0.3
Reporter: Plamen Jeliazkov
Assignee: Michael Weng
 Attachments: branch-1.2-patch.txt, branch-1.2-patch.txt2, 
 branch-1.2-patch.txt3, branch-1.2-patch.txt4, branch-1.2-patch.txt5, 
 branch-1.2-patch.txt6, branch-1.2-patch.txt7, MAPREDUCE-4461.patch, 
 MAPREDUCE-4661.patch, MAPREDUCE-4661.patch, MAPREDUCE-4661.patch


 After investigating the methodology used to add HTTPS support in branch-2, I 
 feel that this same approach should be back-ported to branch-1. I have taken 
 many of the patches used for branch-2 and merged them in.
 I was working on top of HDP 1 at the time - I will provide a patch for trunk 
 soon once I can confirm I am adding only the necessities for supporting HTTPS 
 on the webUIs.
 As an added benefit -- this patch actually provides HTTPS webUI to HBase by 
 extension. If you take a hadoop-core jar compiled with this patch and put it 
 into the hbase/lib directory and apply the necessary configs to hbase/conf.
 = OLD IDEA(s) BEHIND ADDING HTTPS (look @ Sept 17th patch) ==
 In order to provide full security around the cluster, the webUI should also 
 be secure if desired to prevent cookie theft and user masquerading. 
 Here is my proposed work. Currently I can only add HTTPS support. I do not 
 know how to switch reliance of the HttpServer from HTTP to HTTPS fully.
 In order to facilitate this change I propose the following configuration 
 additions:
 CONFIG PROPERTY - DEFAULT VALUE
 mapred.https.enable - false
 mapred.https.need.client.auth - false
 mapred.https.server.keystore.resource - ssl-server.xml
 mapred.job.tracker.https.port - 50035
 mapred.job.tracker.https.address - IP_ADDR:50035
 mapred.task.tracker.https.port - 50065
 mapred.task.tracker.https.address - IP_ADDR:50065
 I tested this on my local box after using keytool to generate a SSL 
 certficate. You will need to change ssl-server.xml to point to the .keystore 
 file after. Truststore may not be necessary; you can just point it to the 
 keystore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5433) use mapreduce to parse hfiles and output keyvalue

2013-08-06 Thread rulinma (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731584#comment-13731584
 ] 

rulinma commented on MAPREDUCE-5433:


import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.io.hfile.CacheConfig;
import org.apache.hadoop.hbase.io.hfile.HFile;
import org.apache.hadoop.hbase.io.hfile.HFileScanner;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.JobContext;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;

public class NHFileInputFormat extends
FileInputFormatImmutableBytesWritable, Result {

private class HFileRecordReader extends
RecordReaderImmutableBytesWritable, Result {
private HFile.Reader reader;
private final HFileScanner scanner;
private int entryNumber = 0;
private ImmutableBytesWritable key = null;
private Result result = null;
ListKeyValue tmpList = new ArrayListKeyValue();

public HFileRecordReader(FileSplit split, Configuration conf)
throws IOException {
final Path path = split.getPath();
reader = HFile.createReader(FileSystem.get(conf), path,
new CacheConfig(conf));
scanner = reader.getScanner(false, false, false);
scanner.seekTo();
}

@Override
public void close() throws IOException {
if (reader != null) {
reader.close();
}
}

@Override
public ImmutableBytesWritable getCurrentKey() throws 
IOException,
InterruptedException {
return key;
}

@Override
public Result getCurrentValue() throws IOException,
InterruptedException {
result = new Result(tmpList);
return result;
}

@Override
public boolean nextKeyValue() throws IOException, 
InterruptedException {
// clear
tmpList.clear();
// first
if (entryNumber == 0) {
entryNumber++;
}
if (entryNumber  reader.getEntries()) {
return false;
}
key = new 
ImmutableBytesWritable(scanner.getKeyValue().getRow());

tmpList.add(scanner.getKeyValue());
while (scanner.next()) {
entryNumber++;
// step next replace compare
if (compareBytes(key.get(), 
scanner.getKeyValue().getRow())) {
// same row
tmpList.add(scanner.getKeyValue());
} else {
return true;
}
}
// last track
if (reader.getEntries()  0  (entryNumber == 
reader.getEntries())) {
entryNumber++;
return true;
}

return false;
}

@Override
public float getProgress() throws IOException, 
InterruptedException {
if (reader != null) {
return (entryNumber / reader.getEntries());
}
return 1;
}

@Override
public void initialize(InputSplit arg0, TaskAttemptContext arg1)
throws IOException, InterruptedException {
// System.out.println(init);
}

}

@Override
protected boolean isSplitable(JobContext context, Path filename) {
return false;
}

@Override
public 

[jira] [Commented] (MAPREDUCE-5433) use mapreduce to parse hfiles and output keyvalue

2013-08-06 Thread rulinma (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731586#comment-13731586
 ] 

rulinma commented on MAPREDUCE-5433:


parse file to table:

import java.io.IOException;
import java.net.URI;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.hbase.client.Result;


public class HFileMapperTable {

public static class MyMap extends
MapperImmutableBytesWritable, Result, 
ImmutableBytesWritable, Put {
public static Counter ct = null;

public void map(ImmutableBytesWritable key, Result value,
Context context) throws IOException, 
InterruptedException {


Put put = new Put(key.copyBytes());
ListKeyValue kvList = value.list();
for (KeyValue kv : kvList) {
put.add(kv);
}

context.write(key, put);
ct = context.getCounter(rowCount, totalRow);
ct.increment(1);
}

public void setup(Context context) {

}
}

public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {

Configuration conf = new Configuration();
Job job = new Job(conf, HFileMapperTable2);

job.setJarByClass(HFileMapperTableTwo.class);
job.setMapperClass(MyMap.class);
job.setInputFormatClass(HFileInputFormatTwo.class);

FileSystem fs = FileSystem.get(URI.create(args[0]), conf);
ListFileStatus result = new ArrayListFileStatus();
addInputPathRecursively(result, fs, new Path(args[0]));
String inputPath = ;

for (FileStatus f : result) {
inputPath = f.getPath() + , + inputPath;
}
if (inputPath.length()  0) {
inputPath = inputPath.substring(0, inputPath.length() - 
1);

}

HFileInputFormatTwo.addInputPaths(job, inputPath);


job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(Put.class);

job.setNumReduceTasks(0);
TableMapReduceUtil.initTableReducerJob(args[1], null, job);


job.waitForCompletion(true);
System.out.println(hfile parsed.);
}

public static void addInputPathRecursively(ListFileStatus result,
FileSystem fs, Path path) throws IOException {
for (FileStatus stat : fs.listStatus(path)) {
if (stat.isDirectory()) {
addInputPathRecursively(result, fs, 
stat.getPath());
} else {
result.add(stat);
}
}
}
}


 use mapreduce to parse hfiles and output keyvalue
 -

 Key: MAPREDUCE-5433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5433
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: examples
Reporter: rulinma
Assignee: rulinma
Priority: Minor



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira