[jira] [Commented] (MAPREDUCE-5351) JobTracker memory leak caused by CleanupQueue reopening FileSystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847253#comment-13847253 ] viswanathan commented on MAPREDUCE-5351: Hi Chris/Jandy, Do I need to update the JobInProgress_JobHistory.patch also to resolve this issue, because I didn't face OOME issue after updating the MAPREDUCE-5508 but Jobtracker hangs after a 7 to 8 days. Please help. Thanks, Viswa.J JobTracker memory leak caused by CleanupQueue reopening FileSystem -- Key: MAPREDUCE-5351 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5351 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 1.1.2 Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Critical Fix For: 1-win, 1.2.1 Attachments: JobInProgress_JobHistory.patch, MAPREDUCE-5351-1.patch, MAPREDUCE-5351-2.patch, MAPREDUCE-5351-addendum-1.patch, MAPREDUCE-5351-addendum.patch, MAPREDUCE-5351.patch When a job is completed, closeAllForUGI is called to close all the cached FileSystems in the FileSystem cache. However, the CleanupQueue may run after this occurs and call FileSystem.get() to delete the staging directory, adding a FileSystem to the cache that will never be closed. People on the user-list have reported this causing their JobTrackers to OOME every two weeks. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5674) Missing start and finish time in mapred.JobStatus
[ https://issues.apache.org/jira/browse/MAPREDUCE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847405#comment-13847405 ] Hudson commented on MAPREDUCE-5674: --- FAILURE: Integrated in Hadoop-Yarn-trunk #420 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/420/]) MAPREDUCE-5674. Missing start and finish time in mapred.JobStatus. Contributed by Chuan Liu. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550472) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/TypeConverter.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/TestTypeConverter.java Missing start and finish time in mapred.JobStatus - Key: MAPREDUCE-5674 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5674 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 3.0.0, 2.2.0 Reporter: Chuan Liu Assignee: Chuan Liu Fix For: 3.0.0, 2.3.0 Attachments: MAPREDUCE-5674.2.patch, MAPREDUCE-5674.patch The JobStatus obtained from the JobClient or runningJob has no start or finish time for the job -- the start and finish time is always 0. This is a regression with respect to 1.0 mapreduce client and JobStatus API. This can also lead to regressions in downstream projects. For example, we discovered the problem in webhcat that the jobstatus for mapreduce job submmited to webhcat always reports start time as 0. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5674) Missing start and finish time in mapred.JobStatus
[ https://issues.apache.org/jira/browse/MAPREDUCE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847489#comment-13847489 ] Hudson commented on MAPREDUCE-5674: --- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1611 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1611/]) MAPREDUCE-5674. Missing start and finish time in mapred.JobStatus. Contributed by Chuan Liu. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550472) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/TypeConverter.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/TestTypeConverter.java Missing start and finish time in mapred.JobStatus - Key: MAPREDUCE-5674 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5674 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 3.0.0, 2.2.0 Reporter: Chuan Liu Assignee: Chuan Liu Fix For: 3.0.0, 2.3.0 Attachments: MAPREDUCE-5674.2.patch, MAPREDUCE-5674.patch The JobStatus obtained from the JobClient or runningJob has no start or finish time for the job -- the start and finish time is always 0. This is a regression with respect to 1.0 mapreduce client and JobStatus API. This can also lead to regressions in downstream projects. For example, we discovered the problem in webhcat that the jobstatus for mapreduce job submmited to webhcat always reports start time as 0. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5674) Missing start and finish time in mapred.JobStatus
[ https://issues.apache.org/jira/browse/MAPREDUCE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847551#comment-13847551 ] Hudson commented on MAPREDUCE-5674: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1637 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1637/]) MAPREDUCE-5674. Missing start and finish time in mapred.JobStatus. Contributed by Chuan Liu. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550472) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/TypeConverter.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/TestTypeConverter.java Missing start and finish time in mapred.JobStatus - Key: MAPREDUCE-5674 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5674 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 3.0.0, 2.2.0 Reporter: Chuan Liu Assignee: Chuan Liu Fix For: 3.0.0, 2.3.0 Attachments: MAPREDUCE-5674.2.patch, MAPREDUCE-5674.patch The JobStatus obtained from the JobClient or runningJob has no start or finish time for the job -- the start and finish time is always 0. This is a regression with respect to 1.0 mapreduce client and JobStatus API. This can also lead to regressions in downstream projects. For example, we discovered the problem in webhcat that the jobstatus for mapreduce job submmited to webhcat always reports start time as 0. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847554#comment-13847554 ] Jason Lowe commented on MAPREDUCE-5641: --- How will the JHS copy the file to the intermediate directory? It likely won't have access to the staging directory containing the jhist file. History for failed Application Masters should be made available to the Job History Server - Key: MAPREDUCE-5641 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster, jobhistoryserver Affects Versions: 2.2.0 Reporter: Robert Kanter Assignee: Robert Kanter Currently, the JHS has no information about jobs whose AMs have failed. This is because the History is written by the AM to the intermediate folder just before finishing, so when it fails for any reason, this information isn't copied there. However, it is not lost as its in the AM's staging directory. To make the History available in the JHS, all we need to do is have another mechanism to move the History from the staging directory to the intermediate directory. The AM also writes a Summary file before exiting normally, which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners
[ https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847562#comment-13847562 ] Tom White commented on MAPREDUCE-3310: -- This seems like a reasonable addition to me. The implementation preserves the current behaviour by default, so it's a safe change. Regarding compatibility, JobContext is an interface but it's marked @Evolving, so adding a new method in the next minor release is permitted. I agree with Joshua's naming suggestions, so it would be good to address those. Custom grouping comparator cannot be set for Combiners -- Key: MAPREDUCE-3310 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 0.20.1 Environment: All Reporter: Mathias Herberts Assignee: Alejandro Abdelnur Attachments: MAPREDUCE-3310-branch-1.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch Combiners are often described as 'Reducers running on the Map side'. As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values associated with the 'same' key. For Reducers, the comparator used for grouping values can be set independently of that used to sort the keys (using Job.setGroupingComparatorClass). Such a configuration is not possible for Combiners, meaning some things done in Reducers cannot be done in Combiners (such as secondary sort). It would be handy to have a Job.setCombinerGroupingComparatorClass method that would allow the setting of the grouping comparator used when applying a Combiner. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5351) JobTracker memory leak caused by CleanupQueue reopening FileSystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847640#comment-13847640 ] Chris Nauroth commented on MAPREDUCE-5351: -- No, AFAIK the JobInProgress_JobHistory.patch is not required, and this leak has been fixed by the patches already committed to svn for MAPREDUCE-5351 and MAPREDUCE-5508. If you're seeing a hang, but not a memory leak, then this sounds like a different issue. For example, MAPREDUCE-5606 is another known bug that can cause a JT to hang. JobTracker memory leak caused by CleanupQueue reopening FileSystem -- Key: MAPREDUCE-5351 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5351 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 1.1.2 Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Critical Fix For: 1-win, 1.2.1 Attachments: JobInProgress_JobHistory.patch, MAPREDUCE-5351-1.patch, MAPREDUCE-5351-2.patch, MAPREDUCE-5351-addendum-1.patch, MAPREDUCE-5351-addendum.patch, MAPREDUCE-5351.patch When a job is completed, closeAllForUGI is called to close all the cached FileSystems in the FileSystem cache. However, the CleanupQueue may run after this occurs and call FileSystem.get() to delete the staging directory, adding a FileSystem to the cache that will never be closed. People on the user-list have reported this causing their JobTrackers to OOME every two weeks. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (MAPREDUCE-2353) Make the MR changes to reflect the API changes in SecureIO library
[ https://issues.apache.org/jira/browse/MAPREDUCE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony resolved MAPREDUCE-2353. - Resolution: Fixed Make the MR changes to reflect the API changes in SecureIO library -- Key: MAPREDUCE-2353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2353 Project: Hadoop Map/Reduce Issue Type: Improvement Components: security, task, tasktracker Affects Versions: 0.22.0 Reporter: Devaraj Das Assignee: Benoy Antony Fix For: 0.22.1 Attachments: MR-2353.patch, mr-2353-0.22.patch Make the MR changes to reflect the API changes in SecureIO library. Specifically, the 'group' argument is never used in the SecureIO library, and hence the API changes. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5623) TestJobCleanup fails because of RejectedExecutionException and NPE.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847742#comment-13847742 ] Kihwal Lee commented on MAPREDUCE-5623: --- +1 The patch looks good to me. TestJobCleanup fails because of RejectedExecutionException and NPE. --- Key: MAPREDUCE-5623 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5623 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Tsuyoshi OZAWA Assignee: Jason Lowe Attachments: MAPREDUCE-5623.1.patch, MAPREDUCE-5623.2.patch, MAPREDUCE-5623.3.patch org.apache.hadoop.mapred.TestJobCleanup can fail because of RejectedExecutionException by NonAggregatingLogHandler. This problem is described in YARN-1409. TestJobCleanup can still fail after fixing RejectedExecutionException, because of NPE by Job#getCounters()'s returning null. {code} --- Test set: org.apache.hadoop.mapred.TestJobCleanup --- Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 140.933 sec FAILURE! - in org.apache.hadoop.mapred.TestJobCleanup testCustomAbort(org.apache.hadoop.mapred.TestJobCleanup) Time elapsed: 31.068 sec ERROR! java.lang.NullPointerException: null at org.apache.hadoop.mapred.TestJobCleanup.testFailedJob(TestJobCleanup.java:199) at org.apache.hadoop.mapred.TestJobCleanup.testCustomAbort(TestJobCleanup.java:296) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (MAPREDUCE-5611) CombineFileInputFormat creates more rack-local tasks due to less split location info.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5611: -- Target Version/s: 1.3.0 (was: trunk) Affects Version/s: (was: trunk) 1.2.1 Fix Version/s: (was: trunk) CombineFileInputFormat creates more rack-local tasks due to less split location info. - Key: MAPREDUCE-5611 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5611 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.1 Reporter: Chandra Prakash Bhagtani Assignee: Chandra Prakash Bhagtani Attachments: CombineFileInputFormat-trunk.patch I have come across an issue with CombineFileInputFormat. Actually I ran a hive query on approx 1.2 GB data with CombineHiveInputFormat which internally uses CombineFileInputFormat. My cluster size is 9 datanodes and max.split.size is 256 MB When I ran this query with replication factor 9, hive consistently creates all 6 rack-local tasks and with replication factor 3 it creates 5 rack-local and 1 data local tasks. When replication factor is 9 (equal to cluster size), all the tasks should be data-local as each datanode contains all the replicas of the input data, but that is not happening i.e all the tasks are rack-local. When I dug into CombineFileInputFormat.java code in getMoreSplits method, I found the issue with the following snippet (specially in case of higher replication factor) {code:title=CombineFileInputFormat.java|borderStyle=solid} for (IteratorMap.EntryString, ListOneBlockInfo iter = nodeToBlocks.entrySet().iterator(); iter.hasNext();) { Map.EntryString, ListOneBlockInfo one = iter.next(); nodes.add(one.getKey()); ListOneBlockInfo blocksInNode = one.getValue(); // for each block, copy it into validBlocks. Delete it from // blockToNodes so that the same block does not appear in // two different splits. for (OneBlockInfo oneblock : blocksInNode) { if (blockToNodes.containsKey(oneblock)) { validBlocks.add(oneblock); blockToNodes.remove(oneblock); curSplitSize += oneblock.length; // if the accumulated split size exceeds the maximum, then // create this split. if (maxSize != 0 curSplitSize = maxSize) { // create an input split and add it to the splits array addCreatedSplit(splits, nodes, validBlocks); curSplitSize = 0; validBlocks.clear(); } } } {code} First node in the map nodeToBlocks has all the replicas of input file, so the above code creates 6 splits all with only one location. Now if JT doesn't schedule these tasks on that node, all the tasks will be rack-local, even though all the other datanodes have all the other replicas. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5611) CombineFileInputFormat creates more rack-local tasks due to less split location info.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847999#comment-13847999 ] Sandy Ryza commented on MAPREDUCE-5611: --- If I understand correctly, the issue is that, even when the blocks referred to by a split refer reside on multiple nodes, CombineFileInputFormat only creates the split with its locations referring to a single node. The proposed fix is to create the split with all nodes that are replicas for any of the blocks included in the split. This would allow the scheduler to prefer placing the tasks on those nodes. However, the proposed change could cause performance regressions in situations when we are combining many small files. Imagine a 1000-node cluster and and we have created a split composed of 1000 small files that all have a replica on a single node. The other replicas for these files are likely spread out on nodes all over the cluster. If we go with the proposed approach then we would end up requesting every node on the cluster, even though we are really only likely to get a data-local performance speedup if the task gets placed on the node where all the files are together. A fix that would not have this performance implication would be to create a split with all the nodes that are in the *intersection* of nodes that blocks in the split reside on. So if a split contains a two blocks, one that resides on node1, node2, and node3, and another that resides on node2, node3, and node4, we would set the split's locations to node2 and node3. If we choose to go with the second route, it would be good to do some quick back-of-the-envelope math to support that the time spent computing these intersections is worth the data-local speedup we could get. CombineFileInputFormat creates more rack-local tasks due to less split location info. - Key: MAPREDUCE-5611 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5611 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.1 Reporter: Chandra Prakash Bhagtani Assignee: Chandra Prakash Bhagtani Attachments: CombineFileInputFormat-trunk.patch I have come across an issue with CombineFileInputFormat. Actually I ran a hive query on approx 1.2 GB data with CombineHiveInputFormat which internally uses CombineFileInputFormat. My cluster size is 9 datanodes and max.split.size is 256 MB When I ran this query with replication factor 9, hive consistently creates all 6 rack-local tasks and with replication factor 3 it creates 5 rack-local and 1 data local tasks. When replication factor is 9 (equal to cluster size), all the tasks should be data-local as each datanode contains all the replicas of the input data, but that is not happening i.e all the tasks are rack-local. When I dug into CombineFileInputFormat.java code in getMoreSplits method, I found the issue with the following snippet (specially in case of higher replication factor) {code:title=CombineFileInputFormat.java|borderStyle=solid} for (IteratorMap.EntryString, ListOneBlockInfo iter = nodeToBlocks.entrySet().iterator(); iter.hasNext();) { Map.EntryString, ListOneBlockInfo one = iter.next(); nodes.add(one.getKey()); ListOneBlockInfo blocksInNode = one.getValue(); // for each block, copy it into validBlocks. Delete it from // blockToNodes so that the same block does not appear in // two different splits. for (OneBlockInfo oneblock : blocksInNode) { if (blockToNodes.containsKey(oneblock)) { validBlocks.add(oneblock); blockToNodes.remove(oneblock); curSplitSize += oneblock.length; // if the accumulated split size exceeds the maximum, then // create this split. if (maxSize != 0 curSplitSize = maxSize) { // create an input split and add it to the splits array addCreatedSplit(splits, nodes, validBlocks); curSplitSize = 0; validBlocks.clear(); } } } {code} First node in the map nodeToBlocks has all the replicas of input file, so the above code creates 6 splits all with only one location. Now if JT doesn't schedule these tasks on that node, all the tasks will be rack-local, even though all the other datanodes have all the other replicas. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (MAPREDUCE-5681) TestJHSSecurity fails
Zhijie Shen created MAPREDUCE-5681: -- Summary: TestJHSSecurity fails Key: MAPREDUCE-5681 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5681 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Zhijie Shen {code} --- T E S T S --- Running org.apache.hadoop.mapreduce.security.TestJHSSecurity Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.763 sec FAILURE! - in org.apache.hadoop.mapreduce.security.TestJHSSecurity testDelegationToken(org.apache.hadoop.mapreduce.security.TestJHSSecurity) Time elapsed: 1.56 sec ERROR! java.lang.NullPointerException: null at java.util.Hashtable.get(Hashtable.java:334) at java.util.Properties.getProperty(Properties.java:932) at org.apache.hadoop.conf.Configuration.get(Configuration.java:874) at org.apache.hadoop.http.HttpServer.initSpnego(HttpServer.java:892) at org.apache.hadoop.http.HttpServer.access$100(HttpServer.java:101) at org.apache.hadoop.http.HttpServer$Builder.build(HttpServer.java:323) at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:232) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.initializeWebApp(HistoryClientService.java:149) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.serviceStart(HistoryClientService.java:118) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceStart(JobHistoryServer.java:175) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.mapreduce.security.TestJHSSecurity.testDelegationToken(TestJHSSecurity.java:100) Results : Tests in error: TestJHSSecurity.testDelegationToken:100 ? NullPointer Tests run: 1, Failures: 0, Errors: 1, Skipped: 0 {code} Did some preliminary investigation, in HistoryClientService: {code} .withHttpSpnegoPrincipalKey( JHAdminConfig.MR_WEBAPP_SPNEGO_USER_NAME_KEY) {code} MR_WEBAPP_SPNEGO_USER_NAME_KEY seems not to be in the configuration. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5681) TestJHSSecurity fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848077#comment-13848077 ] Jonathan Eagles commented on MAPREDUCE-5681: Can you confirm this isn't the same as YARN-1463? TestJHSSecurity fails on trunk -- Key: MAPREDUCE-5681 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5681 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Zhijie Shen {code} --- T E S T S --- Running org.apache.hadoop.mapreduce.security.TestJHSSecurity Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.763 sec FAILURE! - in org.apache.hadoop.mapreduce.security.TestJHSSecurity testDelegationToken(org.apache.hadoop.mapreduce.security.TestJHSSecurity) Time elapsed: 1.56 sec ERROR! java.lang.NullPointerException: null at java.util.Hashtable.get(Hashtable.java:334) at java.util.Properties.getProperty(Properties.java:932) at org.apache.hadoop.conf.Configuration.get(Configuration.java:874) at org.apache.hadoop.http.HttpServer.initSpnego(HttpServer.java:892) at org.apache.hadoop.http.HttpServer.access$100(HttpServer.java:101) at org.apache.hadoop.http.HttpServer$Builder.build(HttpServer.java:323) at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:232) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.initializeWebApp(HistoryClientService.java:149) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.serviceStart(HistoryClientService.java:118) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceStart(JobHistoryServer.java:175) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.mapreduce.security.TestJHSSecurity.testDelegationToken(TestJHSSecurity.java:100) Results : Tests in error: TestJHSSecurity.testDelegationToken:100 ? NullPointer Tests run: 1, Failures: 0, Errors: 1, Skipped: 0 {code} Did some preliminary investigation, in HistoryClientService: {code} .withHttpSpnegoPrincipalKey( JHAdminConfig.MR_WEBAPP_SPNEGO_USER_NAME_KEY) {code} MR_WEBAPP_SPNEGO_USER_NAME_KEY seems not to be in the configuration. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (MAPREDUCE-5611) CombineFileInputFormat only requests a single location per split when more could be optimal
[ https://issues.apache.org/jira/browse/MAPREDUCE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5611: -- Summary: CombineFileInputFormat only requests a single location per split when more could be optimal (was: CombineFileInputFormat creates more rack-local tasks due to less split location info.) CombineFileInputFormat only requests a single location per split when more could be optimal --- Key: MAPREDUCE-5611 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5611 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.1 Reporter: Chandra Prakash Bhagtani Assignee: Chandra Prakash Bhagtani Attachments: CombineFileInputFormat-trunk.patch I have come across an issue with CombineFileInputFormat. Actually I ran a hive query on approx 1.2 GB data with CombineHiveInputFormat which internally uses CombineFileInputFormat. My cluster size is 9 datanodes and max.split.size is 256 MB When I ran this query with replication factor 9, hive consistently creates all 6 rack-local tasks and with replication factor 3 it creates 5 rack-local and 1 data local tasks. When replication factor is 9 (equal to cluster size), all the tasks should be data-local as each datanode contains all the replicas of the input data, but that is not happening i.e all the tasks are rack-local. When I dug into CombineFileInputFormat.java code in getMoreSplits method, I found the issue with the following snippet (specially in case of higher replication factor) {code:title=CombineFileInputFormat.java|borderStyle=solid} for (IteratorMap.EntryString, ListOneBlockInfo iter = nodeToBlocks.entrySet().iterator(); iter.hasNext();) { Map.EntryString, ListOneBlockInfo one = iter.next(); nodes.add(one.getKey()); ListOneBlockInfo blocksInNode = one.getValue(); // for each block, copy it into validBlocks. Delete it from // blockToNodes so that the same block does not appear in // two different splits. for (OneBlockInfo oneblock : blocksInNode) { if (blockToNodes.containsKey(oneblock)) { validBlocks.add(oneblock); blockToNodes.remove(oneblock); curSplitSize += oneblock.length; // if the accumulated split size exceeds the maximum, then // create this split. if (maxSize != 0 curSplitSize = maxSize) { // create an input split and add it to the splits array addCreatedSplit(splits, nodes, validBlocks); curSplitSize = 0; validBlocks.clear(); } } } {code} First node in the map nodeToBlocks has all the replicas of input file, so the above code creates 6 splits all with only one location. Now if JT doesn't schedule these tasks on that node, all the tasks will be rack-local, even though all the other datanodes have all the other replicas. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (MAPREDUCE-5682) Job Counters are not retrieved properly from web UI
Mark Wagner created MAPREDUCE-5682: -- Summary: Job Counters are not retrieved properly from web UI Key: MAPREDUCE-5682 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5682 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.2.1 Reporter: Mark Wagner Assignee: Mark Wagner After MAPREDUCE-4962, most job counters are coming back as 0 because the web UI retrieves them using the name instead of displayName, which is what is expected: {code} /** * Returns the value of the specified counter, or 0 if the counter does * not exist. */ public synchronized long getCounter(String counterName) { for(Counter counter: subcounters.values()) { if (counter != null counter.getDisplayName().equals(counterName)) { return counter.getValue(); } } return 0L; }{code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (MAPREDUCE-5682) Job Counters are not retrieved properly from web UI
[ https://issues.apache.org/jira/browse/MAPREDUCE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Wagner updated MAPREDUCE-5682: --- Affects Version/s: 1.3.0 Job Counters are not retrieved properly from web UI --- Key: MAPREDUCE-5682 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5682 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.2.1, 1.3.0 Reporter: Mark Wagner Assignee: Mark Wagner Attachments: MAPREDUCE-5682.1.patch After MAPREDUCE-4962, most job counters are coming back as 0 because the web UI retrieves them using the name instead of displayName, which is what is expected: {code} /** * Returns the value of the specified counter, or 0 if the counter does * not exist. */ public synchronized long getCounter(String counterName) { for(Counter counter: subcounters.values()) { if (counter != null counter.getDisplayName().equals(counterName)) { return counter.getValue(); } } return 0L; }{code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (MAPREDUCE-5682) Job Counters are not retrieved properly from web UI
[ https://issues.apache.org/jira/browse/MAPREDUCE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Wagner updated MAPREDUCE-5682: --- Attachment: MAPREDUCE-5682.1.patch This patch reverts MAPREDUCE-4962. Job Counters are not retrieved properly from web UI --- Key: MAPREDUCE-5682 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5682 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.2.1, 1.3.0 Reporter: Mark Wagner Assignee: Mark Wagner Attachments: MAPREDUCE-5682.1.patch After MAPREDUCE-4962, most job counters are coming back as 0 because the web UI retrieves them using the name instead of displayName, which is what is expected: {code} /** * Returns the value of the specified counter, or 0 if the counter does * not exist. */ public synchronized long getCounter(String counterName) { for(Counter counter: subcounters.values()) { if (counter != null counter.getDisplayName().equals(counterName)) { return counter.getValue(); } } return 0L; }{code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5351) JobTracker memory leak caused by CleanupQueue reopening FileSystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848154#comment-13848154 ] viswanathan commented on MAPREDUCE-5351: Hi Chris, For that JT hangs, you have asked to disable the hadoop user history location. So doing this will solve the problem. What exactly happens if I set this value to NONE. By setting this value, JT heap memory will not increase as much like now. Also I'm using the version 1.2.1, when the JT memory reaches 6.68/8.89 GB it start hangs every ten days. Not able to submit the job and UI is not loading at all. Please help. JobTracker memory leak caused by CleanupQueue reopening FileSystem -- Key: MAPREDUCE-5351 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5351 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 1.1.2 Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Critical Fix For: 1-win, 1.2.1 Attachments: JobInProgress_JobHistory.patch, MAPREDUCE-5351-1.patch, MAPREDUCE-5351-2.patch, MAPREDUCE-5351-addendum-1.patch, MAPREDUCE-5351-addendum.patch, MAPREDUCE-5351.patch When a job is completed, closeAllForUGI is called to close all the cached FileSystems in the FileSystem cache. However, the CleanupQueue may run after this occurs and call FileSystem.get() to delete the staging directory, adding a FileSystem to the cache that will never be closed. People on the user-list have reported this causing their JobTrackers to OOME every two weeks. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5611) CombineFileInputFormat only requests a single location per split when more could be optimal
[ https://issues.apache.org/jira/browse/MAPREDUCE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848164#comment-13848164 ] Rajesh Balamohan commented on MAPREDUCE-5611: - Agreed, ideal will be to compute and have the *intersection* of the nodes in the split information. We will modify the patch to accommodate this and post the details. CombineFileInputFormat only requests a single location per split when more could be optimal --- Key: MAPREDUCE-5611 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5611 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.1 Reporter: Chandra Prakash Bhagtani Assignee: Chandra Prakash Bhagtani Attachments: CombineFileInputFormat-trunk.patch I have come across an issue with CombineFileInputFormat. Actually I ran a hive query on approx 1.2 GB data with CombineHiveInputFormat which internally uses CombineFileInputFormat. My cluster size is 9 datanodes and max.split.size is 256 MB When I ran this query with replication factor 9, hive consistently creates all 6 rack-local tasks and with replication factor 3 it creates 5 rack-local and 1 data local tasks. When replication factor is 9 (equal to cluster size), all the tasks should be data-local as each datanode contains all the replicas of the input data, but that is not happening i.e all the tasks are rack-local. When I dug into CombineFileInputFormat.java code in getMoreSplits method, I found the issue with the following snippet (specially in case of higher replication factor) {code:title=CombineFileInputFormat.java|borderStyle=solid} for (IteratorMap.EntryString, ListOneBlockInfo iter = nodeToBlocks.entrySet().iterator(); iter.hasNext();) { Map.EntryString, ListOneBlockInfo one = iter.next(); nodes.add(one.getKey()); ListOneBlockInfo blocksInNode = one.getValue(); // for each block, copy it into validBlocks. Delete it from // blockToNodes so that the same block does not appear in // two different splits. for (OneBlockInfo oneblock : blocksInNode) { if (blockToNodes.containsKey(oneblock)) { validBlocks.add(oneblock); blockToNodes.remove(oneblock); curSplitSize += oneblock.length; // if the accumulated split size exceeds the maximum, then // create this split. if (maxSize != 0 curSplitSize = maxSize) { // create an input split and add it to the splits array addCreatedSplit(splits, nodes, validBlocks); curSplitSize = 0; validBlocks.clear(); } } } {code} First node in the map nodeToBlocks has all the replicas of input file, so the above code creates 6 splits all with only one location. Now if JT doesn't schedule these tasks on that node, all the tasks will be rack-local, even though all the other datanodes have all the other replicas. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (MAPREDUCE-5681) TestJHSSecurity fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved MAPREDUCE-5681. Resolution: Duplicate TestJHSSecurity fails on trunk -- Key: MAPREDUCE-5681 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5681 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Zhijie Shen {code} --- T E S T S --- Running org.apache.hadoop.mapreduce.security.TestJHSSecurity Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.763 sec FAILURE! - in org.apache.hadoop.mapreduce.security.TestJHSSecurity testDelegationToken(org.apache.hadoop.mapreduce.security.TestJHSSecurity) Time elapsed: 1.56 sec ERROR! java.lang.NullPointerException: null at java.util.Hashtable.get(Hashtable.java:334) at java.util.Properties.getProperty(Properties.java:932) at org.apache.hadoop.conf.Configuration.get(Configuration.java:874) at org.apache.hadoop.http.HttpServer.initSpnego(HttpServer.java:892) at org.apache.hadoop.http.HttpServer.access$100(HttpServer.java:101) at org.apache.hadoop.http.HttpServer$Builder.build(HttpServer.java:323) at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:232) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.initializeWebApp(HistoryClientService.java:149) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.serviceStart(HistoryClientService.java:118) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceStart(JobHistoryServer.java:175) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.mapreduce.security.TestJHSSecurity.testDelegationToken(TestJHSSecurity.java:100) Results : Tests in error: TestJHSSecurity.testDelegationToken:100 ? NullPointer Tests run: 1, Failures: 0, Errors: 1, Skipped: 0 {code} Did some preliminary investigation, in HistoryClientService: {code} .withHttpSpnegoPrincipalKey( JHAdminConfig.MR_WEBAPP_SPNEGO_USER_NAME_KEY) {code} MR_WEBAPP_SPNEGO_USER_NAME_KEY seems not to be in the configuration. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5681) TestJHSSecurity fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848166#comment-13848166 ] Zhijie Shen commented on MAPREDUCE-5681: [~jeagles], thanks for the info. It looks like the same problem as that of YARN-1463. Will close this ticket as duplicate. TestJHSSecurity fails on trunk -- Key: MAPREDUCE-5681 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5681 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Zhijie Shen {code} --- T E S T S --- Running org.apache.hadoop.mapreduce.security.TestJHSSecurity Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.763 sec FAILURE! - in org.apache.hadoop.mapreduce.security.TestJHSSecurity testDelegationToken(org.apache.hadoop.mapreduce.security.TestJHSSecurity) Time elapsed: 1.56 sec ERROR! java.lang.NullPointerException: null at java.util.Hashtable.get(Hashtable.java:334) at java.util.Properties.getProperty(Properties.java:932) at org.apache.hadoop.conf.Configuration.get(Configuration.java:874) at org.apache.hadoop.http.HttpServer.initSpnego(HttpServer.java:892) at org.apache.hadoop.http.HttpServer.access$100(HttpServer.java:101) at org.apache.hadoop.http.HttpServer$Builder.build(HttpServer.java:323) at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:232) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.initializeWebApp(HistoryClientService.java:149) at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.serviceStart(HistoryClientService.java:118) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceStart(JobHistoryServer.java:175) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.mapreduce.security.TestJHSSecurity.testDelegationToken(TestJHSSecurity.java:100) Results : Tests in error: TestJHSSecurity.testDelegationToken:100 ? NullPointer Tests run: 1, Failures: 0, Errors: 1, Skipped: 0 {code} Did some preliminary investigation, in HistoryClientService: {code} .withHttpSpnegoPrincipalKey( JHAdminConfig.MR_WEBAPP_SPNEGO_USER_NAME_KEY) {code} MR_WEBAPP_SPNEGO_USER_NAME_KEY seems not to be in the configuration. -- This message was sent by Atlassian JIRA (v6.1.4#6159)