[jira] [Commented] (MAPREDUCE-5351) JobTracker memory leak caused by CleanupQueue reopening FileSystem

2013-12-13 Thread viswanathan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847253#comment-13847253
 ] 

viswanathan commented on MAPREDUCE-5351:


Hi Chris/Jandy,

Do I need to update the JobInProgress_JobHistory.patch also to resolve this 
issue, because I didn't face OOME issue after updating the MAPREDUCE-5508 but 
Jobtracker hangs after a 7 to 8 days.

Please help.

Thanks,
Viswa.J

 JobTracker memory leak caused by CleanupQueue reopening FileSystem
 --

 Key: MAPREDUCE-5351
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5351
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1.1.2
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Critical
 Fix For: 1-win, 1.2.1

 Attachments: JobInProgress_JobHistory.patch, MAPREDUCE-5351-1.patch, 
 MAPREDUCE-5351-2.patch, MAPREDUCE-5351-addendum-1.patch, 
 MAPREDUCE-5351-addendum.patch, MAPREDUCE-5351.patch


 When a job is completed, closeAllForUGI is called to close all the cached 
 FileSystems in the FileSystem cache.  However, the CleanupQueue may run after 
 this occurs and call FileSystem.get() to delete the staging directory, adding 
 a FileSystem to the cache that will never be closed.
 People on the user-list have reported this causing their JobTrackers to OOME 
 every two weeks.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAPREDUCE-5674) Missing start and finish time in mapred.JobStatus

2013-12-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847405#comment-13847405
 ] 

Hudson commented on MAPREDUCE-5674:
---

FAILURE: Integrated in Hadoop-Yarn-trunk #420 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/420/])
MAPREDUCE-5674. Missing start and finish time in mapred.JobStatus. Contributed 
by Chuan Liu. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550472)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/TypeConverter.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/TestTypeConverter.java


 Missing start and finish time in mapred.JobStatus
 -

 Key: MAPREDUCE-5674
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5674
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 3.0.0, 2.2.0
Reporter: Chuan Liu
Assignee: Chuan Liu
 Fix For: 3.0.0, 2.3.0

 Attachments: MAPREDUCE-5674.2.patch, MAPREDUCE-5674.patch


 The JobStatus obtained from the JobClient or runningJob has no start or 
 finish time for the job -- the start and finish time is always 0. This is a 
 regression with respect to 1.0 mapreduce client and JobStatus API. This can 
 also lead to regressions in downstream projects. For example, we discovered 
 the problem in webhcat that the jobstatus for mapreduce job submmited to 
 webhcat always reports start time as 0.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAPREDUCE-5674) Missing start and finish time in mapred.JobStatus

2013-12-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847489#comment-13847489
 ] 

Hudson commented on MAPREDUCE-5674:
---

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1611 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1611/])
MAPREDUCE-5674. Missing start and finish time in mapred.JobStatus. Contributed 
by Chuan Liu. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550472)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/TypeConverter.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/TestTypeConverter.java


 Missing start and finish time in mapred.JobStatus
 -

 Key: MAPREDUCE-5674
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5674
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 3.0.0, 2.2.0
Reporter: Chuan Liu
Assignee: Chuan Liu
 Fix For: 3.0.0, 2.3.0

 Attachments: MAPREDUCE-5674.2.patch, MAPREDUCE-5674.patch


 The JobStatus obtained from the JobClient or runningJob has no start or 
 finish time for the job -- the start and finish time is always 0. This is a 
 regression with respect to 1.0 mapreduce client and JobStatus API. This can 
 also lead to regressions in downstream projects. For example, we discovered 
 the problem in webhcat that the jobstatus for mapreduce job submmited to 
 webhcat always reports start time as 0.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAPREDUCE-5674) Missing start and finish time in mapred.JobStatus

2013-12-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847551#comment-13847551
 ] 

Hudson commented on MAPREDUCE-5674:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1637 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1637/])
MAPREDUCE-5674. Missing start and finish time in mapred.JobStatus. Contributed 
by Chuan Liu. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550472)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/TypeConverter.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/TestTypeConverter.java


 Missing start and finish time in mapred.JobStatus
 -

 Key: MAPREDUCE-5674
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5674
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 3.0.0, 2.2.0
Reporter: Chuan Liu
Assignee: Chuan Liu
 Fix For: 3.0.0, 2.3.0

 Attachments: MAPREDUCE-5674.2.patch, MAPREDUCE-5674.patch


 The JobStatus obtained from the JobClient or runningJob has no start or 
 finish time for the job -- the start and finish time is always 0. This is a 
 regression with respect to 1.0 mapreduce client and JobStatus API. This can 
 also lead to regressions in downstream projects. For example, we discovered 
 the problem in webhcat that the jobstatus for mapreduce job submmited to 
 webhcat always reports start time as 0.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2013-12-13 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847554#comment-13847554
 ] 

Jason Lowe commented on MAPREDUCE-5641:
---

How will the JHS copy the file to the intermediate directory?  It likely won't 
have access to the staging directory containing the jhist file.

 History for failed Application Masters should be made available to the Job 
 History Server
 -

 Key: MAPREDUCE-5641
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: applicationmaster, jobhistoryserver
Affects Versions: 2.2.0
Reporter: Robert Kanter
Assignee: Robert Kanter

 Currently, the JHS has no information about jobs whose AMs have failed.  This 
 is because the History is written by the AM to the intermediate folder just 
 before finishing, so when it fails for any reason, this information isn't 
 copied there.  However, it is not lost as its in the AM's staging directory.  
 To make the History available in the JHS, all we need to do is have another 
 mechanism to move the History from the staging directory to the intermediate 
 directory.  The AM also writes a Summary file before exiting normally, 
 which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAPREDUCE-3310) Custom grouping comparator cannot be set for Combiners

2013-12-13 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847562#comment-13847562
 ] 

Tom White commented on MAPREDUCE-3310:
--

This seems like a reasonable addition to me. The implementation preserves the 
current behaviour by default, so it's a safe change. Regarding compatibility, 
JobContext is an interface but it's marked @Evolving, so adding a new method in 
the next minor release is permitted.

I agree with Joshua's naming suggestions, so it would be good to address those.

 Custom grouping comparator cannot be set for Combiners
 --

 Key: MAPREDUCE-3310
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3310
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.1
 Environment: All
Reporter: Mathias Herberts
Assignee: Alejandro Abdelnur
 Attachments: MAPREDUCE-3310-branch-1.patch, 
 MAPREDUCE-3310-trunk.patch, MAPREDUCE-3310-trunk.patch, 
 MAPREDUCE-3310-trunk.patch


 Combiners are often described as 'Reducers running on the Map side'.
 As Reducers, Combiners are fed K,{V}, where {V} is built by grouping values 
 associated with the 'same' key.
 For Reducers, the comparator used for grouping values can be set 
 independently of that used to sort the keys (using 
 Job.setGroupingComparatorClass).
 Such a configuration is not possible for Combiners, meaning some things done 
 in Reducers cannot be done in Combiners (such as secondary sort).
 It would be handy to have a Job.setCombinerGroupingComparatorClass method 
 that would allow the setting of the grouping comparator used when applying a 
 Combiner.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAPREDUCE-5351) JobTracker memory leak caused by CleanupQueue reopening FileSystem

2013-12-13 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847640#comment-13847640
 ] 

Chris Nauroth commented on MAPREDUCE-5351:
--

No, AFAIK the JobInProgress_JobHistory.patch is not required, and this leak has 
been fixed by the patches already committed to svn for MAPREDUCE-5351 and 
MAPREDUCE-5508.  If you're seeing a hang, but not a memory leak, then this 
sounds like a different issue.  For example, MAPREDUCE-5606 is another known 
bug that can cause a JT to hang.

 JobTracker memory leak caused by CleanupQueue reopening FileSystem
 --

 Key: MAPREDUCE-5351
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5351
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1.1.2
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Critical
 Fix For: 1-win, 1.2.1

 Attachments: JobInProgress_JobHistory.patch, MAPREDUCE-5351-1.patch, 
 MAPREDUCE-5351-2.patch, MAPREDUCE-5351-addendum-1.patch, 
 MAPREDUCE-5351-addendum.patch, MAPREDUCE-5351.patch


 When a job is completed, closeAllForUGI is called to close all the cached 
 FileSystems in the FileSystem cache.  However, the CleanupQueue may run after 
 this occurs and call FileSystem.get() to delete the staging directory, adding 
 a FileSystem to the cache that will never be closed.
 People on the user-list have reported this causing their JobTrackers to OOME 
 every two weeks.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (MAPREDUCE-2353) Make the MR changes to reflect the API changes in SecureIO library

2013-12-13 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony resolved MAPREDUCE-2353.
-

Resolution: Fixed

 Make the MR changes to reflect the API changes in SecureIO library
 --

 Key: MAPREDUCE-2353
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2353
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: security, task, tasktracker
Affects Versions: 0.22.0
Reporter: Devaraj Das
Assignee: Benoy Antony
 Fix For: 0.22.1

 Attachments: MR-2353.patch, mr-2353-0.22.patch


 Make the MR changes to reflect the API changes in SecureIO library. 
 Specifically, the 'group' argument is never used in the SecureIO library, and 
 hence the API changes.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAPREDUCE-5623) TestJobCleanup fails because of RejectedExecutionException and NPE.

2013-12-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847742#comment-13847742
 ] 

Kihwal Lee commented on MAPREDUCE-5623:
---

+1 The patch looks good to me.

 TestJobCleanup fails because of RejectedExecutionException and NPE.
 ---

 Key: MAPREDUCE-5623
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5623
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA
Assignee: Jason Lowe
 Attachments: MAPREDUCE-5623.1.patch, MAPREDUCE-5623.2.patch, 
 MAPREDUCE-5623.3.patch


 org.apache.hadoop.mapred.TestJobCleanup can fail because of 
 RejectedExecutionException by NonAggregatingLogHandler. This problem is 
 described in YARN-1409. TestJobCleanup can still fail after fixing 
 RejectedExecutionException, because of NPE by Job#getCounters()'s returning 
 null.
 {code}
 ---
 Test set: org.apache.hadoop.mapred.TestJobCleanup
 ---
 Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 140.933 sec 
  FAILURE! - in org.apache.hadoop.mapred.TestJobCleanup
 testCustomAbort(org.apache.hadoop.mapred.TestJobCleanup)  Time elapsed: 
 31.068 sec   ERROR!
 java.lang.NullPointerException: null
 at 
 org.apache.hadoop.mapred.TestJobCleanup.testFailedJob(TestJobCleanup.java:199)
 at 
 org.apache.hadoop.mapred.TestJobCleanup.testCustomAbort(TestJobCleanup.java:296)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAPREDUCE-5611) CombineFileInputFormat creates more rack-local tasks due to less split location info.

2013-12-13 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5611:
--

 Target Version/s: 1.3.0  (was: trunk)
Affects Version/s: (was: trunk)
   1.2.1
Fix Version/s: (was: trunk)

 CombineFileInputFormat creates more rack-local tasks due to less split 
 location info.
 -

 Key: MAPREDUCE-5611
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5611
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Chandra Prakash Bhagtani
Assignee: Chandra Prakash Bhagtani
 Attachments: CombineFileInputFormat-trunk.patch


 I have come across an issue with CombineFileInputFormat. Actually I ran a 
 hive query on approx 1.2 GB data with CombineHiveInputFormat which internally 
 uses CombineFileInputFormat. My cluster size is 9 datanodes and 
 max.split.size is 256 MB
 When I ran this query with replication factor 9, hive consistently creates 
 all 6 rack-local tasks and with replication factor 3 it creates 5 rack-local 
 and 1 data local tasks. 
  When replication factor is 9 (equal to cluster size), all the tasks should 
 be data-local as each datanode contains all the replicas of the input data, 
 but that is not happening i.e all the tasks are rack-local. 
 When I dug into CombineFileInputFormat.java code in getMoreSplits method, I 
 found the issue with the following snippet (specially in case of higher 
 replication factor)
 {code:title=CombineFileInputFormat.java|borderStyle=solid}
 for (IteratorMap.EntryString,
  ListOneBlockInfo iter = nodeToBlocks.entrySet().iterator();
  iter.hasNext();) {
Map.EntryString, ListOneBlockInfo one = iter.next();
   nodes.add(one.getKey());
   ListOneBlockInfo blocksInNode = one.getValue();
   // for each block, copy it into validBlocks. Delete it from
   // blockToNodes so that the same block does not appear in
   // two different splits.
   for (OneBlockInfo oneblock : blocksInNode) {
 if (blockToNodes.containsKey(oneblock)) {
   validBlocks.add(oneblock);
   blockToNodes.remove(oneblock);
   curSplitSize += oneblock.length;
   // if the accumulated split size exceeds the maximum, then
   // create this split.
   if (maxSize != 0  curSplitSize = maxSize) {
 // create an input split and add it to the splits array
 addCreatedSplit(splits, nodes, validBlocks);
 curSplitSize = 0;
 validBlocks.clear();
   }
 }
   }
 {code}
 First node in the map nodeToBlocks has all the replicas of input file, so the 
 above code creates 6 splits all with only one location. Now if JT doesn't 
 schedule these tasks on that node, all the tasks will be rack-local, even 
 though all the other datanodes have all the other replicas.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAPREDUCE-5611) CombineFileInputFormat creates more rack-local tasks due to less split location info.

2013-12-13 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847999#comment-13847999
 ] 

Sandy Ryza commented on MAPREDUCE-5611:
---

If I understand correctly, the issue is that, even when the blocks referred to 
by a split refer reside on multiple nodes, CombineFileInputFormat only creates 
the split with its locations referring to a single node.  The proposed fix is 
to create the split with all nodes that are replicas for any of the blocks 
included in the split.  This would allow the scheduler to prefer placing the 
tasks on those nodes.

However, the proposed change could cause performance regressions in situations 
when we are combining many small files.  Imagine a 1000-node cluster and and we 
have created a split composed of 1000 small files that all have a replica on a 
single node.  The other replicas for these files are likely spread out on nodes 
all over the cluster.  If we go with the proposed approach then we would end up 
requesting every node on the cluster, even though we are really only likely to 
get a data-local performance speedup if the task gets placed on the node where 
all the files are together.

A fix that would not have this performance implication would be to create a 
split with all the nodes that are in the *intersection* of nodes that blocks in 
the split reside on.  So if a split contains a two blocks, one that resides on 
node1, node2, and node3, and another that resides on node2, node3, and node4, 
we would set the split's locations to node2 and node3.

If we choose to go with the second route, it would be good to do some quick 
back-of-the-envelope math to support that the time spent computing these 
intersections is worth the data-local speedup we could get.


 CombineFileInputFormat creates more rack-local tasks due to less split 
 location info.
 -

 Key: MAPREDUCE-5611
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5611
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Chandra Prakash Bhagtani
Assignee: Chandra Prakash Bhagtani
 Attachments: CombineFileInputFormat-trunk.patch


 I have come across an issue with CombineFileInputFormat. Actually I ran a 
 hive query on approx 1.2 GB data with CombineHiveInputFormat which internally 
 uses CombineFileInputFormat. My cluster size is 9 datanodes and 
 max.split.size is 256 MB
 When I ran this query with replication factor 9, hive consistently creates 
 all 6 rack-local tasks and with replication factor 3 it creates 5 rack-local 
 and 1 data local tasks. 
  When replication factor is 9 (equal to cluster size), all the tasks should 
 be data-local as each datanode contains all the replicas of the input data, 
 but that is not happening i.e all the tasks are rack-local. 
 When I dug into CombineFileInputFormat.java code in getMoreSplits method, I 
 found the issue with the following snippet (specially in case of higher 
 replication factor)
 {code:title=CombineFileInputFormat.java|borderStyle=solid}
 for (IteratorMap.EntryString,
  ListOneBlockInfo iter = nodeToBlocks.entrySet().iterator();
  iter.hasNext();) {
Map.EntryString, ListOneBlockInfo one = iter.next();
   nodes.add(one.getKey());
   ListOneBlockInfo blocksInNode = one.getValue();
   // for each block, copy it into validBlocks. Delete it from
   // blockToNodes so that the same block does not appear in
   // two different splits.
   for (OneBlockInfo oneblock : blocksInNode) {
 if (blockToNodes.containsKey(oneblock)) {
   validBlocks.add(oneblock);
   blockToNodes.remove(oneblock);
   curSplitSize += oneblock.length;
   // if the accumulated split size exceeds the maximum, then
   // create this split.
   if (maxSize != 0  curSplitSize = maxSize) {
 // create an input split and add it to the splits array
 addCreatedSplit(splits, nodes, validBlocks);
 curSplitSize = 0;
 validBlocks.clear();
   }
 }
   }
 {code}
 First node in the map nodeToBlocks has all the replicas of input file, so the 
 above code creates 6 splits all with only one location. Now if JT doesn't 
 schedule these tasks on that node, all the tasks will be rack-local, even 
 though all the other datanodes have all the other replicas.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (MAPREDUCE-5681) TestJHSSecurity fails

2013-12-13 Thread Zhijie Shen (JIRA)
Zhijie Shen created MAPREDUCE-5681:
--

 Summary: TestJHSSecurity fails
 Key: MAPREDUCE-5681
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5681
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Zhijie Shen


{code}
---
 T E S T S
---
Running org.apache.hadoop.mapreduce.security.TestJHSSecurity
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.763 sec  
FAILURE! - in org.apache.hadoop.mapreduce.security.TestJHSSecurity
testDelegationToken(org.apache.hadoop.mapreduce.security.TestJHSSecurity)  Time 
elapsed: 1.56 sec   ERROR!
java.lang.NullPointerException: null
at java.util.Hashtable.get(Hashtable.java:334)
at java.util.Properties.getProperty(Properties.java:932)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:874)
at org.apache.hadoop.http.HttpServer.initSpnego(HttpServer.java:892)
at org.apache.hadoop.http.HttpServer.access$100(HttpServer.java:101)
at org.apache.hadoop.http.HttpServer$Builder.build(HttpServer.java:323)
at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:232)
at 
org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.initializeWebApp(HistoryClientService.java:149)
at 
org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.serviceStart(HistoryClientService.java:118)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
at 
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceStart(JobHistoryServer.java:175)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.mapreduce.security.TestJHSSecurity.testDelegationToken(TestJHSSecurity.java:100)


Results :

Tests in error: 
  TestJHSSecurity.testDelegationToken:100 ? NullPointer

Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
{code}

Did some preliminary investigation, in HistoryClientService:
{code}
.withHttpSpnegoPrincipalKey(
JHAdminConfig.MR_WEBAPP_SPNEGO_USER_NAME_KEY)
{code}
MR_WEBAPP_SPNEGO_USER_NAME_KEY seems not to be in the configuration.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAPREDUCE-5681) TestJHSSecurity fails on trunk

2013-12-13 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848077#comment-13848077
 ] 

Jonathan Eagles commented on MAPREDUCE-5681:


Can you confirm this isn't the same as YARN-1463?

 TestJHSSecurity fails on trunk
 --

 Key: MAPREDUCE-5681
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5681
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Zhijie Shen

 {code}
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapreduce.security.TestJHSSecurity
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.763 sec  
 FAILURE! - in org.apache.hadoop.mapreduce.security.TestJHSSecurity
 testDelegationToken(org.apache.hadoop.mapreduce.security.TestJHSSecurity)  
 Time elapsed: 1.56 sec   ERROR!
 java.lang.NullPointerException: null
   at java.util.Hashtable.get(Hashtable.java:334)
   at java.util.Properties.getProperty(Properties.java:932)
   at org.apache.hadoop.conf.Configuration.get(Configuration.java:874)
   at org.apache.hadoop.http.HttpServer.initSpnego(HttpServer.java:892)
   at org.apache.hadoop.http.HttpServer.access$100(HttpServer.java:101)
   at org.apache.hadoop.http.HttpServer$Builder.build(HttpServer.java:323)
   at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:232)
   at 
 org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.initializeWebApp(HistoryClientService.java:149)
   at 
 org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.serviceStart(HistoryClientService.java:118)
   at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceStart(JobHistoryServer.java:175)
   at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.mapreduce.security.TestJHSSecurity.testDelegationToken(TestJHSSecurity.java:100)
 Results :
 Tests in error: 
   TestJHSSecurity.testDelegationToken:100 ? NullPointer
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
 {code}
 Did some preliminary investigation, in HistoryClientService:
 {code}
 .withHttpSpnegoPrincipalKey(
 JHAdminConfig.MR_WEBAPP_SPNEGO_USER_NAME_KEY)
 {code}
 MR_WEBAPP_SPNEGO_USER_NAME_KEY seems not to be in the configuration.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAPREDUCE-5611) CombineFileInputFormat only requests a single location per split when more could be optimal

2013-12-13 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5611:
--

Summary: CombineFileInputFormat only requests a single location per split 
when more could be optimal  (was: CombineFileInputFormat creates more 
rack-local tasks due to less split location info.)

 CombineFileInputFormat only requests a single location per split when more 
 could be optimal
 ---

 Key: MAPREDUCE-5611
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5611
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Chandra Prakash Bhagtani
Assignee: Chandra Prakash Bhagtani
 Attachments: CombineFileInputFormat-trunk.patch


 I have come across an issue with CombineFileInputFormat. Actually I ran a 
 hive query on approx 1.2 GB data with CombineHiveInputFormat which internally 
 uses CombineFileInputFormat. My cluster size is 9 datanodes and 
 max.split.size is 256 MB
 When I ran this query with replication factor 9, hive consistently creates 
 all 6 rack-local tasks and with replication factor 3 it creates 5 rack-local 
 and 1 data local tasks. 
  When replication factor is 9 (equal to cluster size), all the tasks should 
 be data-local as each datanode contains all the replicas of the input data, 
 but that is not happening i.e all the tasks are rack-local. 
 When I dug into CombineFileInputFormat.java code in getMoreSplits method, I 
 found the issue with the following snippet (specially in case of higher 
 replication factor)
 {code:title=CombineFileInputFormat.java|borderStyle=solid}
 for (IteratorMap.EntryString,
  ListOneBlockInfo iter = nodeToBlocks.entrySet().iterator();
  iter.hasNext();) {
Map.EntryString, ListOneBlockInfo one = iter.next();
   nodes.add(one.getKey());
   ListOneBlockInfo blocksInNode = one.getValue();
   // for each block, copy it into validBlocks. Delete it from
   // blockToNodes so that the same block does not appear in
   // two different splits.
   for (OneBlockInfo oneblock : blocksInNode) {
 if (blockToNodes.containsKey(oneblock)) {
   validBlocks.add(oneblock);
   blockToNodes.remove(oneblock);
   curSplitSize += oneblock.length;
   // if the accumulated split size exceeds the maximum, then
   // create this split.
   if (maxSize != 0  curSplitSize = maxSize) {
 // create an input split and add it to the splits array
 addCreatedSplit(splits, nodes, validBlocks);
 curSplitSize = 0;
 validBlocks.clear();
   }
 }
   }
 {code}
 First node in the map nodeToBlocks has all the replicas of input file, so the 
 above code creates 6 splits all with only one location. Now if JT doesn't 
 schedule these tasks on that node, all the tasks will be rack-local, even 
 though all the other datanodes have all the other replicas.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (MAPREDUCE-5682) Job Counters are not retrieved properly from web UI

2013-12-13 Thread Mark Wagner (JIRA)
Mark Wagner created MAPREDUCE-5682:
--

 Summary: Job Counters are not retrieved properly from web UI
 Key: MAPREDUCE-5682
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5682
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.2.1
Reporter: Mark Wagner
Assignee: Mark Wagner


After MAPREDUCE-4962, most job counters are coming back as 0 because the web UI 
retrieves them using the name instead of displayName, which is what is 
expected: 

{code}
/**
 * Returns the value of the specified counter, or 0 if the counter does
 * not exist.
 */
public synchronized long getCounter(String counterName) {
  for(Counter counter: subcounters.values()) {
if (counter != null  counter.getDisplayName().equals(counterName)) {
  return counter.getValue();
}
  }
  return 0L;
}{code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAPREDUCE-5682) Job Counters are not retrieved properly from web UI

2013-12-13 Thread Mark Wagner (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Wagner updated MAPREDUCE-5682:
---

Affects Version/s: 1.3.0

 Job Counters are not retrieved properly from web UI
 ---

 Key: MAPREDUCE-5682
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5682
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.2.1, 1.3.0
Reporter: Mark Wagner
Assignee: Mark Wagner
 Attachments: MAPREDUCE-5682.1.patch


 After MAPREDUCE-4962, most job counters are coming back as 0 because the web 
 UI retrieves them using the name instead of displayName, which is what is 
 expected: 
 {code}
 /**
  * Returns the value of the specified counter, or 0 if the counter does
  * not exist.
  */
 public synchronized long getCounter(String counterName) {
   for(Counter counter: subcounters.values()) {
 if (counter != null  counter.getDisplayName().equals(counterName)) {
   return counter.getValue();
 }
   }
   return 0L;
 }{code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAPREDUCE-5682) Job Counters are not retrieved properly from web UI

2013-12-13 Thread Mark Wagner (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Wagner updated MAPREDUCE-5682:
---

Attachment: MAPREDUCE-5682.1.patch

This patch reverts MAPREDUCE-4962.

 Job Counters are not retrieved properly from web UI
 ---

 Key: MAPREDUCE-5682
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5682
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.2.1, 1.3.0
Reporter: Mark Wagner
Assignee: Mark Wagner
 Attachments: MAPREDUCE-5682.1.patch


 After MAPREDUCE-4962, most job counters are coming back as 0 because the web 
 UI retrieves them using the name instead of displayName, which is what is 
 expected: 
 {code}
 /**
  * Returns the value of the specified counter, or 0 if the counter does
  * not exist.
  */
 public synchronized long getCounter(String counterName) {
   for(Counter counter: subcounters.values()) {
 if (counter != null  counter.getDisplayName().equals(counterName)) {
   return counter.getValue();
 }
   }
   return 0L;
 }{code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAPREDUCE-5351) JobTracker memory leak caused by CleanupQueue reopening FileSystem

2013-12-13 Thread viswanathan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848154#comment-13848154
 ] 

viswanathan commented on MAPREDUCE-5351:


Hi Chris,

For that JT hangs, you have asked to disable the hadoop user history
location. So doing this will solve the problem. What exactly happens if I
set this value to NONE. By setting this value, JT heap memory will not
increase as much like now.

Also I'm using the version 1.2.1, when the JT memory reaches 6.68/8.89 GB
it start hangs every ten days. Not able to submit the job and UI is not
loading at all.

Please help.


 JobTracker memory leak caused by CleanupQueue reopening FileSystem
 --

 Key: MAPREDUCE-5351
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5351
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1.1.2
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Critical
 Fix For: 1-win, 1.2.1

 Attachments: JobInProgress_JobHistory.patch, MAPREDUCE-5351-1.patch, 
 MAPREDUCE-5351-2.patch, MAPREDUCE-5351-addendum-1.patch, 
 MAPREDUCE-5351-addendum.patch, MAPREDUCE-5351.patch


 When a job is completed, closeAllForUGI is called to close all the cached 
 FileSystems in the FileSystem cache.  However, the CleanupQueue may run after 
 this occurs and call FileSystem.get() to delete the staging directory, adding 
 a FileSystem to the cache that will never be closed.
 People on the user-list have reported this causing their JobTrackers to OOME 
 every two weeks.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAPREDUCE-5611) CombineFileInputFormat only requests a single location per split when more could be optimal

2013-12-13 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848164#comment-13848164
 ] 

Rajesh Balamohan commented on MAPREDUCE-5611:
-

Agreed, ideal will be to compute and have the *intersection* of the nodes in 
the split information.  We will modify the patch to accommodate this and post 
the details.

 CombineFileInputFormat only requests a single location per split when more 
 could be optimal
 ---

 Key: MAPREDUCE-5611
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5611
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Chandra Prakash Bhagtani
Assignee: Chandra Prakash Bhagtani
 Attachments: CombineFileInputFormat-trunk.patch


 I have come across an issue with CombineFileInputFormat. Actually I ran a 
 hive query on approx 1.2 GB data with CombineHiveInputFormat which internally 
 uses CombineFileInputFormat. My cluster size is 9 datanodes and 
 max.split.size is 256 MB
 When I ran this query with replication factor 9, hive consistently creates 
 all 6 rack-local tasks and with replication factor 3 it creates 5 rack-local 
 and 1 data local tasks. 
  When replication factor is 9 (equal to cluster size), all the tasks should 
 be data-local as each datanode contains all the replicas of the input data, 
 but that is not happening i.e all the tasks are rack-local. 
 When I dug into CombineFileInputFormat.java code in getMoreSplits method, I 
 found the issue with the following snippet (specially in case of higher 
 replication factor)
 {code:title=CombineFileInputFormat.java|borderStyle=solid}
 for (IteratorMap.EntryString,
  ListOneBlockInfo iter = nodeToBlocks.entrySet().iterator();
  iter.hasNext();) {
Map.EntryString, ListOneBlockInfo one = iter.next();
   nodes.add(one.getKey());
   ListOneBlockInfo blocksInNode = one.getValue();
   // for each block, copy it into validBlocks. Delete it from
   // blockToNodes so that the same block does not appear in
   // two different splits.
   for (OneBlockInfo oneblock : blocksInNode) {
 if (blockToNodes.containsKey(oneblock)) {
   validBlocks.add(oneblock);
   blockToNodes.remove(oneblock);
   curSplitSize += oneblock.length;
   // if the accumulated split size exceeds the maximum, then
   // create this split.
   if (maxSize != 0  curSplitSize = maxSize) {
 // create an input split and add it to the splits array
 addCreatedSplit(splits, nodes, validBlocks);
 curSplitSize = 0;
 validBlocks.clear();
   }
 }
   }
 {code}
 First node in the map nodeToBlocks has all the replicas of input file, so the 
 above code creates 6 splits all with only one location. Now if JT doesn't 
 schedule these tasks on that node, all the tasks will be rack-local, even 
 though all the other datanodes have all the other replicas.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (MAPREDUCE-5681) TestJHSSecurity fails on trunk

2013-12-13 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved MAPREDUCE-5681.


Resolution: Duplicate

 TestJHSSecurity fails on trunk
 --

 Key: MAPREDUCE-5681
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5681
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Zhijie Shen

 {code}
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapreduce.security.TestJHSSecurity
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.763 sec  
 FAILURE! - in org.apache.hadoop.mapreduce.security.TestJHSSecurity
 testDelegationToken(org.apache.hadoop.mapreduce.security.TestJHSSecurity)  
 Time elapsed: 1.56 sec   ERROR!
 java.lang.NullPointerException: null
   at java.util.Hashtable.get(Hashtable.java:334)
   at java.util.Properties.getProperty(Properties.java:932)
   at org.apache.hadoop.conf.Configuration.get(Configuration.java:874)
   at org.apache.hadoop.http.HttpServer.initSpnego(HttpServer.java:892)
   at org.apache.hadoop.http.HttpServer.access$100(HttpServer.java:101)
   at org.apache.hadoop.http.HttpServer$Builder.build(HttpServer.java:323)
   at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:232)
   at 
 org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.initializeWebApp(HistoryClientService.java:149)
   at 
 org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.serviceStart(HistoryClientService.java:118)
   at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceStart(JobHistoryServer.java:175)
   at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.mapreduce.security.TestJHSSecurity.testDelegationToken(TestJHSSecurity.java:100)
 Results :
 Tests in error: 
   TestJHSSecurity.testDelegationToken:100 ? NullPointer
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
 {code}
 Did some preliminary investigation, in HistoryClientService:
 {code}
 .withHttpSpnegoPrincipalKey(
 JHAdminConfig.MR_WEBAPP_SPNEGO_USER_NAME_KEY)
 {code}
 MR_WEBAPP_SPNEGO_USER_NAME_KEY seems not to be in the configuration.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAPREDUCE-5681) TestJHSSecurity fails on trunk

2013-12-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848166#comment-13848166
 ] 

Zhijie Shen commented on MAPREDUCE-5681:


[~jeagles], thanks for the info. It looks like the same problem as that of 
YARN-1463. Will close this ticket as duplicate.

 TestJHSSecurity fails on trunk
 --

 Key: MAPREDUCE-5681
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5681
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Zhijie Shen

 {code}
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapreduce.security.TestJHSSecurity
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.763 sec  
 FAILURE! - in org.apache.hadoop.mapreduce.security.TestJHSSecurity
 testDelegationToken(org.apache.hadoop.mapreduce.security.TestJHSSecurity)  
 Time elapsed: 1.56 sec   ERROR!
 java.lang.NullPointerException: null
   at java.util.Hashtable.get(Hashtable.java:334)
   at java.util.Properties.getProperty(Properties.java:932)
   at org.apache.hadoop.conf.Configuration.get(Configuration.java:874)
   at org.apache.hadoop.http.HttpServer.initSpnego(HttpServer.java:892)
   at org.apache.hadoop.http.HttpServer.access$100(HttpServer.java:101)
   at org.apache.hadoop.http.HttpServer$Builder.build(HttpServer.java:323)
   at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:232)
   at 
 org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.initializeWebApp(HistoryClientService.java:149)
   at 
 org.apache.hadoop.mapreduce.v2.hs.HistoryClientService.serviceStart(HistoryClientService.java:118)
   at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
   at 
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceStart(JobHistoryServer.java:175)
   at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.mapreduce.security.TestJHSSecurity.testDelegationToken(TestJHSSecurity.java:100)
 Results :
 Tests in error: 
   TestJHSSecurity.testDelegationToken:100 ? NullPointer
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
 {code}
 Did some preliminary investigation, in HistoryClientService:
 {code}
 .withHttpSpnegoPrincipalKey(
 JHAdminConfig.MR_WEBAPP_SPNEGO_USER_NAME_KEY)
 {code}
 MR_WEBAPP_SPNEGO_USER_NAME_KEY seems not to be in the configuration.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)