[jira] [Commented] (MAPREDUCE-4884) streaming tests fail to start MiniMRCluster due to Queue configuration missing child queue names for root
[ https://issues.apache.org/jira/browse/MAPREDUCE-4884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570110#comment-13570110 ] Ivan A. Veselovsky commented on MAPREDUCE-4884: --- Can this change please be back-ported to branch-2 branch? streaming tests fail to start MiniMRCluster due to Queue configuration missing child queue names for root --- Key: MAPREDUCE-4884 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4884 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming, test Affects Versions: 3.0.0, trunk-win Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.0.0 Attachments: MAPREDUCE-4884.1.patch Multiple tests in hadoop-streaming, such as {{TestFileArgs}}, fail to initialize {{MiniMRCluster}} due to a {{YarnException}} with reason Queue configuration missing child queue names for root. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4974) optimising the LineRecordReader initialize method
Arun A K created MAPREDUCE-4974: --- Summary: optimising the LineRecordReader initialize method Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 0.23.5, 2.0.2-alpha Environment: Hadoop Linux Reporter: Arun A K Assignee: Arun A K Fix For: 0.20.204.0, 0.24.0 I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4974) optimising the LineRecordReader initialize method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gelesh updated MAPREDUCE-4974: -- Assignee: Gelesh (was: Arun A K) Target Version/s: 0.23.5, 0.23.4, 2.0.1-alpha, 2.0.0-alpha, 1.1.1, 1.0.4, 1.0.0 (was: 1.0.0, 1.0.4, 1.1.1, 2.0.0-alpha, 2.0.1-alpha, 0.23.4, 0.23.5) Status: Patch Available (was: Open) optimising the LineRecordReader initialize method - Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 0.23.5, 2.0.2-alpha Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.20.204.0, 0.24.0 Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4974) optimising the LineRecordReader initialize method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gelesh updated MAPREDUCE-4974: -- Attachment: MAPREDUCE-4974.1.patch Combined thoughts of mine Arun AK's, optimising the LineRecordReader initialize method - Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.20.204.0, 0.24.0 Attachments: MAPREDUCE-4974.1.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) optimising the LineRecordReader initialize method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570140#comment-13570140 ] Gelesh commented on MAPREDUCE-4974: --- Some body please review the patch, I couldnt even see the hadoop QA running on this. Kindly advice optimising the LineRecordReader initialize method - Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.20.204.0, 0.24.0 Attachments: MAPREDUCE-4974.1.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) optimising the LineRecordReader initialize method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570145#comment-13570145 ] Hadoop QA commented on MAPREDUCE-4974: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567831/MAPREDUCE-4974.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3297//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3297//console This message is automatically generated. optimising the LineRecordReader initialize method - Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.20.204.0, 0.24.0 Attachments: MAPREDUCE-4974.1.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) optimising the LineRecordReader initialize method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570154#comment-13570154 ] Gelesh commented on MAPREDUCE-4974: --- Its a improvement to the existing, no new features added or deleted, And hence, existing test case would suffice. optimising the LineRecordReader initialize method - Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.20.204.0, 0.24.0 Attachments: MAPREDUCE-4974.1.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun A K updated MAPREDUCE-4974: Summary: Optimising the LineRecordReader initialize() method (was: optimising the LineRecordReader initialize method) Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.20.204.0, 0.24.0 Attachments: MAPREDUCE-4974.1.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570161#comment-13570161 ] Arun A K commented on MAPREDUCE-4974: - Quoting the review request url for this issue - https://reviews.apache.org/r/9287/ Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.20.204.0, 0.24.0 Attachments: MAPREDUCE-4974.1.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4975) streaming/gridmix docs missing
Thomas Graves created MAPREDUCE-4975: Summary: streaming/gridmix docs missing Key: MAPREDUCE-4975 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4975 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.6 Reporter: Thomas Graves The docs for hadoop streaming and gridmix weren't moved out of the mrv1 code so don't existing in the 0.23 or 2.x line. ie the 1.X line are http://hadoop.apache.org/docs/r1.1.0/streaming.html and http://hadoop.apache.org/docs/r1.1.0/gridmix.html We should also check for others that are missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4884) streaming tests fail to start MiniMRCluster due to Queue configuration missing child queue names for root
[ https://issues.apache.org/jira/browse/MAPREDUCE-4884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-4884: --- Fix Version/s: 2.0.3-alpha Thanks for the fix. I just merged this into branch-2, because the tests were failing there still. streaming tests fail to start MiniMRCluster due to Queue configuration missing child queue names for root --- Key: MAPREDUCE-4884 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4884 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming, test Affects Versions: 3.0.0, trunk-win Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.0.0, 2.0.3-alpha Attachments: MAPREDUCE-4884.1.patch Multiple tests in hadoop-streaming, such as {{TestFileArgs}}, fail to initialize {{MiniMRCluster}} due to a {{YarnException}} with reason Queue configuration missing child queue names for root. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570547#comment-13570547 ] Todd Lipcon commented on MAPREDUCE-4974: Do you have any benchmark that shows this helps? Null checks can often be completely optimized out by the JIT, or at least hoisted out of the tight loop. Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.20.204.0, 0.24.0 Attachments: MAPREDUCE-4974.1.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570567#comment-13570567 ] Gelesh commented on MAPREDUCE-4974: --- [~tlipcon] nextKeyValue() is called as many number of times, the delimiter, or the new line has occurred, with in a given split. Each Time, it executes the below code, -if (key == null) { - key = new LongWritable(); -} -key.set(pos); -if (value == null) { - value = new Text(); -} Only at the first iteration, the condition would hold true, and Key Value objects would be created. This could also be done, if we have Key Value objects created at the initialize phase, and we can skip this null check. Also, -compressionCodecs = new CompressionCodecFactory(job); -codec = compressionCodecs.getCodec(file); Need to be done , only when it uses a compressed input file. This change is also brought. Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.20.204.0, 0.24.0 Attachments: MAPREDUCE-4974.1.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4976) Fix test failure for HADOOP-9252
Tsz Wo (Nicholas), SZE created MAPREDUCE-4976: - Summary: Fix test failure for HADOOP-9252 Key: MAPREDUCE-4976 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4976 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor HADOOP-9252 slightly changes the format of some StringUtils outputs. It may cause test failures. Also, some methods was deprecated by HADOOP-9252. The use of them should be replaced with the new methods. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4976) Fix test failure for HADOOP-9252
[ https://issues.apache.org/jira/browse/MAPREDUCE-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-4976: -- Description: HADOOP-9252 slightly changes the format of some StringUtils outputs. It may cause test failures. Also, some methods were deprecated by HADOOP-9252. The use of them should be replaced with the new methods. was: HADOOP-9252 slightly changes the format of some StringUtils outputs. It may cause test failures. Also, some methods was deprecated by HADOOP-9252. The use of them should be replaced with the new methods. Fix test failure for HADOOP-9252 Key: MAPREDUCE-4976 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4976 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor HADOOP-9252 slightly changes the format of some StringUtils outputs. It may cause test failures. Also, some methods were deprecated by HADOOP-9252. The use of them should be replaced with the new methods. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4964) JobLocalizer#localizeJobFiles can potentially write job.xml to the wrong user's directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570664#comment-13570664 ] Arun C Murthy commented on MAPREDUCE-4964: -- Karthik - makes sense. Please upload this patch to MAPREDUCE-4843 and we can commit via the same jira. Thanks. JobLocalizer#localizeJobFiles can potentially write job.xml to the wrong user's directory - Key: MAPREDUCE-4964 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4964 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.1.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: MR-4964.patch, MR-4964.patch In the following code, if jobs corresponding to different users (X and Y) are localized simultaneously, it is possible that jobconf can be written to the wrong user's directory. (X's job.xml can be written to Y's directory) {code} public void localizeJobFiles(JobID jobid, JobConf jConf, Path localJobTokenFile, TaskUmbilicalProtocol taskTracker) throws IOException, InterruptedException { localizeJobFiles(jobid, jConf, lDirAlloc.getLocalPathForWrite(JOBCONF, ttConf), localJobTokenFile, taskTracker); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4822) Unnecessary conversions in History Events
[ https://issues.apache.org/jira/browse/MAPREDUCE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated MAPREDUCE-4822: Attachment: MAPREDUCE-4822.patch Removed unnecessary conversion, please review. Unnecessary conversions in History Events - Key: MAPREDUCE-4822 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4822 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Priority: Trivial Attachments: MAPREDUCE-4822.patch There are a number of conversions in the Job History Event classes that are totally unnecessary. It appears that they were originally used to convert from the internal avro format, but now many of them do not pull the values from the avro they store them internally. For example: {code:title=TaskAttemptFinishedEvent.java} /** Get the task type */ public TaskType getTaskType() { return TaskType.valueOf(taskType.toString()); } {code} The code currently is taking an enum, converting it to a string and then asking the same enum to convert it back to an enum. If java work properly this should be a noop and a reference to the original taskType should be returned. There are several places that a string is having toString called on it, and since strings are immutable it returns a reference to itself. The various ids are not immutable and probably should not be changed at this point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-4843) When using DefaultTaskController, JobLocalizer not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla reassigned MAPREDUCE-4843: --- Assignee: Karthik Kambatla When using DefaultTaskController, JobLocalizer not thread safe -- Key: MAPREDUCE-4843 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4843 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 1.1.1 Reporter: zhaoyunjiong Assignee: Karthik Kambatla Priority: Critical Attachments: MAPREDUCE-4843-branch-1.1.patch In our cluster, some times job will failed due to below exception: 2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201212031626_1115_r_23_0: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1175) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1058) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2213) The root cause is JobLocalizer is not thread safe. In DefaultTaskController.initializeJob method: JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, jobid); but in JobLocalizer, it just simply keep the reference of the conf. When two TaskLauncher threads(mapLauncher and reduceLauncher) try to initializeJob at same time, it will have two JobLocalizer, but only one conf instance. So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset previous job's conf. Then it will cause the previous job's job.xml stored at another user's dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4843) When using DefaultTaskController, JobLocalizer not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated MAPREDUCE-4843: Attachment: mr-4843.patch Uploading the patch from MAPREDUCE-4964 as that solves this issue in a simpler/cleaner way. The discussion on that JIRA has all the details. Applied the patch to latest branch-1 and it applies cleanly. Also, verified TestJobLocalizer passes. When using DefaultTaskController, JobLocalizer not thread safe -- Key: MAPREDUCE-4843 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4843 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 1.1.1 Reporter: zhaoyunjiong Assignee: Karthik Kambatla Priority: Critical Attachments: MAPREDUCE-4843-branch-1.1.patch, mr-4843.patch In our cluster, some times job will failed due to below exception: 2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201212031626_1115_r_23_0: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1175) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1058) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2213) The root cause is JobLocalizer is not thread safe. In DefaultTaskController.initializeJob method: JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, jobid); but in JobLocalizer, it just simply keep the reference of the conf. When two TaskLauncher threads(mapLauncher and reduceLauncher) try to initializeJob at same time, it will have two JobLocalizer, but only one conf instance. So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset previous job's conf. Then it will cause the previous job's job.xml stored at another user's dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4964) JobLocalizer#localizeJobFiles can potentially write job.xml to the wrong user's directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated MAPREDUCE-4964: Resolution: Duplicate Status: Resolved (was: Patch Available) Thanks Arun. Closing this JIRA as a duplicate of MR-4843, I uploaded the latest patch here to MAPREDUCE-4843. JobLocalizer#localizeJobFiles can potentially write job.xml to the wrong user's directory - Key: MAPREDUCE-4964 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4964 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.1.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: MR-4964.patch, MR-4964.patch In the following code, if jobs corresponding to different users (X and Y) are localized simultaneously, it is possible that jobconf can be written to the wrong user's directory. (X's job.xml can be written to Y's directory) {code} public void localizeJobFiles(JobID jobid, JobConf jConf, Path localJobTokenFile, TaskUmbilicalProtocol taskTracker) throws IOException, InterruptedException { localizeJobFiles(jobid, jConf, lDirAlloc.getLocalPathForWrite(JOBCONF, ttConf), localJobTokenFile, taskTracker); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4843) When using DefaultTaskController, JobLocalizer not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570687#comment-13570687 ] Hadoop QA commented on MAPREDUCE-4843: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567889/mr-4843.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3298//console This message is automatically generated. When using DefaultTaskController, JobLocalizer not thread safe -- Key: MAPREDUCE-4843 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4843 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 1.1.1 Reporter: zhaoyunjiong Assignee: Karthik Kambatla Priority: Critical Attachments: MAPREDUCE-4843-branch-1.1.patch, mr-4843.patch In our cluster, some times job will failed due to below exception: 2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201212031626_1115_r_23_0: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1175) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1058) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2213) The root cause is JobLocalizer is not thread safe. In DefaultTaskController.initializeJob method: JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, jobid); but in JobLocalizer, it just simply keep the reference of the conf. When two TaskLauncher threads(mapLauncher and reduceLauncher) try to initializeJob at same time, it will have two JobLocalizer, but only one conf instance. So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset previous job's conf. Then it will cause the previous job's job.xml stored at another user's dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-4822) Unnecessary conversions in History Events
[ https://issues.apache.org/jira/browse/MAPREDUCE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong reassigned MAPREDUCE-4822: --- Assignee: Chu Tong Unnecessary conversions in History Events - Key: MAPREDUCE-4822 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4822 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Assignee: Chu Tong Priority: Trivial Attachments: MAPREDUCE-4822.patch There are a number of conversions in the Job History Event classes that are totally unnecessary. It appears that they were originally used to convert from the internal avro format, but now many of them do not pull the values from the avro they store them internally. For example: {code:title=TaskAttemptFinishedEvent.java} /** Get the task type */ public TaskType getTaskType() { return TaskType.valueOf(taskType.toString()); } {code} The code currently is taking an enum, converting it to a string and then asking the same enum to convert it back to an enum. If java work properly this should be a noop and a reference to the original taskType should be returned. There are several places that a string is having toString called on it, and since strings are immutable it returns a reference to itself. The various ids are not immutable and probably should not be changed at this point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4843) When using DefaultTaskController, JobLocalizer not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570694#comment-13570694 ] Alejandro Abdelnur commented on MAPREDUCE-4843: --- +1. As per discussion in MAPREDUCE-4964 the latest patch seems a better way of doing it. When using DefaultTaskController, JobLocalizer not thread safe -- Key: MAPREDUCE-4843 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4843 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 1.1.1 Reporter: zhaoyunjiong Assignee: Karthik Kambatla Priority: Critical Attachments: MAPREDUCE-4843-branch-1.1.patch, mr-4843.patch In our cluster, some times job will failed due to below exception: 2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201212031626_1115_r_23_0: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1175) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1058) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2213) The root cause is JobLocalizer is not thread safe. In DefaultTaskController.initializeJob method: JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, jobid); but in JobLocalizer, it just simply keep the reference of the conf. When two TaskLauncher threads(mapLauncher and reduceLauncher) try to initializeJob at same time, it will have two JobLocalizer, but only one conf instance. So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset previous job's conf. Then it will cause the previous job's job.xml stored at another user's dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4843) When using DefaultTaskController, JobLocalizer not thread safe
[ https://issues.apache.org/jira/browse/MAPREDUCE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-4843: -- Resolution: Fixed Fix Version/s: 1.2.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Karthik. Committed to branch-1. Arun, thanks for double checking on this one. When using DefaultTaskController, JobLocalizer not thread safe -- Key: MAPREDUCE-4843 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4843 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 1.1.1 Reporter: zhaoyunjiong Assignee: Karthik Kambatla Priority: Critical Fix For: 1.2.0 Attachments: MAPREDUCE-4843-branch-1.1.patch, mr-4843.patch In our cluster, some times job will failed due to below exception: 2012-12-03 23:11:54,811 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201212031626_1115_r_23_0: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/$username/jobcache/job_201212031626_1115/job.xml in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:424) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1175) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1058) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2213) The root cause is JobLocalizer is not thread safe. In DefaultTaskController.initializeJob method: JobLocalizer localizer = new JobLocalizer((JobConf)getConf(), user, jobid); but in JobLocalizer, it just simply keep the reference of the conf. When two TaskLauncher threads(mapLauncher and reduceLauncher) try to initializeJob at same time, it will have two JobLocalizer, but only one conf instance. So some times ttConf.setStrings(JOB_LOCAL_CTXT, localDirs) will reset previous job's conf. Then it will cause the previous job's job.xml stored at another user's dir. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4822) Unnecessary conversions in History Events
[ https://issues.apache.org/jira/browse/MAPREDUCE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated MAPREDUCE-4822: Attachment: (was: MAPREDUCE-4822.patch) Unnecessary conversions in History Events - Key: MAPREDUCE-4822 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4822 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Assignee: Chu Tong Priority: Trivial There are a number of conversions in the Job History Event classes that are totally unnecessary. It appears that they were originally used to convert from the internal avro format, but now many of them do not pull the values from the avro they store them internally. For example: {code:title=TaskAttemptFinishedEvent.java} /** Get the task type */ public TaskType getTaskType() { return TaskType.valueOf(taskType.toString()); } {code} The code currently is taking an enum, converting it to a string and then asking the same enum to convert it back to an enum. If java work properly this should be a noop and a reference to the original taskType should be returned. There are several places that a string is having toString called on it, and since strings are immutable it returns a reference to itself. The various ids are not immutable and probably should not be changed at this point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4822) Unnecessary conversions in History Events
[ https://issues.apache.org/jira/browse/MAPREDUCE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated MAPREDUCE-4822: Attachment: MAPREDUCE-4822.patch Unnecessary conversions in History Events - Key: MAPREDUCE-4822 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4822 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Assignee: Chu Tong Priority: Trivial Attachments: MAPREDUCE-4822.patch There are a number of conversions in the Job History Event classes that are totally unnecessary. It appears that they were originally used to convert from the internal avro format, but now many of them do not pull the values from the avro they store them internally. For example: {code:title=TaskAttemptFinishedEvent.java} /** Get the task type */ public TaskType getTaskType() { return TaskType.valueOf(taskType.toString()); } {code} The code currently is taking an enum, converting it to a string and then asking the same enum to convert it back to an enum. If java work properly this should be a noop and a reference to the original taskType should be returned. There are several places that a string is having toString called on it, and since strings are immutable it returns a reference to itself. The various ids are not immutable and probably should not be changed at this point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4822) Unnecessary conversions in History Events
[ https://issues.apache.org/jira/browse/MAPREDUCE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated MAPREDUCE-4822: Fix Version/s: 0.24.0 Labels: patch (was: ) Target Version/s: 0.23.4 Status: Patch Available (was: Open) Unnecessary conversions in History Events - Key: MAPREDUCE-4822 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4822 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Assignee: Chu Tong Priority: Trivial Labels: patch Fix For: 0.24.0 Attachments: MAPREDUCE-4822.patch There are a number of conversions in the Job History Event classes that are totally unnecessary. It appears that they were originally used to convert from the internal avro format, but now many of them do not pull the values from the avro they store them internally. For example: {code:title=TaskAttemptFinishedEvent.java} /** Get the task type */ public TaskType getTaskType() { return TaskType.valueOf(taskType.toString()); } {code} The code currently is taking an enum, converting it to a string and then asking the same enum to convert it back to an enum. If java work properly this should be a noop and a reference to the original taskType should be returned. There are several places that a string is having toString called on it, and since strings are immutable it returns a reference to itself. The various ids are not immutable and probably should not be changed at this point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4822) Unnecessary conversions in History Events
[ https://issues.apache.org/jira/browse/MAPREDUCE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570840#comment-13570840 ] Hadoop QA commented on MAPREDUCE-4822: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567924/MAPREDUCE-4822.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3299//console This message is automatically generated. Unnecessary conversions in History Events - Key: MAPREDUCE-4822 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4822 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Assignee: Chu Tong Priority: Trivial Labels: patch Fix For: 0.24.0 Attachments: MAPREDUCE-4822.patch There are a number of conversions in the Job History Event classes that are totally unnecessary. It appears that they were originally used to convert from the internal avro format, but now many of them do not pull the values from the avro they store them internally. For example: {code:title=TaskAttemptFinishedEvent.java} /** Get the task type */ public TaskType getTaskType() { return TaskType.valueOf(taskType.toString()); } {code} The code currently is taking an enum, converting it to a string and then asking the same enum to convert it back to an enum. If java work properly this should be a noop and a reference to the original taskType should be returned. There are several places that a string is having toString called on it, and since strings are immutable it returns a reference to itself. The various ids are not immutable and probably should not be changed at this point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4973) Backport history clean up configurations to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla resolved MAPREDUCE-4973. - Resolution: Duplicate Backport history clean up configurations to branch-1 Key: MAPREDUCE-4973 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4973 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.1.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla In trunk-based versions, we can configure the max-age of files after which they will be cleaned up. This JIRA is to backport those configurations to branch-1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4820) MRApps distributed-cache duplicate checks are incorrect
[ https://issues.apache.org/jira/browse/MAPREDUCE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-4820: - Priority: Major (was: Blocker) MRApps distributed-cache duplicate checks are incorrect --- Key: MAPREDUCE-4820 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4820 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am Affects Versions: 2.0.2-alpha Reporter: Alejandro Abdelnur Fix For: 2.0.3-alpha This seems a combination of issues that are being exposed in 2.0.2-alpha by MAPREDUCE-4549. MAPREDUCE-4549 introduces a check to to ensure there are not duplicate JARs in the distributed-cache (using the JAR name as identity). In Hadoop 2 (different from Hadoop 1), all JARs in the distributed-cache are symlink-ed to the current directory of the task. MRApps, when setting up the DistributedCache (MRApps#setupDistributedCache-parseDistributedCacheArtifacts) assumes that the local resources (this includes files in the CURRENT_DIR/, CURRENT_DIR/classes/ and files in CURRENT_DIR/lib/) are part of the distributed-cache already. For systems, like Oozie, which use a launcher job to submit the real job this poses a problem because MRApps is run from the launcher job to submit the real job. The configuration of the real job has the correct distributed-cache entries (no duplicates), but because the current dir has the same files, the submission fails. It seems that MRApps should not be checking dups in the distributed-cached against JARs in the CURRENT_DIR/ or CURRENT_DIR/lib/. The dup check should be done among distributed-cached entries only. It seems YARNRunner is symlink-ing all files in the distributed cached in the current directory. In Hadoop 1 this was done only for files added to the distributed-cache using a fragment (ie #FOO) to trigger a symlink creation. Marking as a blocker because without a fix for this, Oozie cannot submit jobs to Hadoop 2 (i've debugged Oozie in a live cluster being used by BigTop -thanks Roman- to test their release work, and I've verified that Oozie 3.3 does not create duplicated entries in the distributed-cache) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4967) TestJvmReuse fails on assertion
[ https://issues.apache.org/jira/browse/MAPREDUCE-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated MAPREDUCE-4967: Attachment: mr-4967.patch Uploading a trivial patch that converts TestJvmReuse to use junit4. ant test -Dtestcase=TestJvmReuse doesn't run any tests now. TestJvmReuse fails on assertion --- Key: MAPREDUCE-4967 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4967 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker, test Affects Versions: 1.1.2 Reporter: Chris Nauroth Assignee: Karthik Kambatla Attachments: mr-4967.patch {{TestJvmReuse}} on branch-1 consistently fails on an assertion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4967) TestJvmReuse fails on assertion
[ https://issues.apache.org/jira/browse/MAPREDUCE-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated MAPREDUCE-4967: Status: Patch Available (was: Open) TestJvmReuse fails on assertion --- Key: MAPREDUCE-4967 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4967 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker, test Affects Versions: 1.1.2 Reporter: Chris Nauroth Assignee: Karthik Kambatla Attachments: mr-4967.patch {{TestJvmReuse}} on branch-1 consistently fails on an assertion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4967) TestJvmReuse fails on assertion
[ https://issues.apache.org/jira/browse/MAPREDUCE-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570884#comment-13570884 ] Surenkumar Nihalani commented on MAPREDUCE-4967: +1 TestJvmReuse fails on assertion --- Key: MAPREDUCE-4967 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4967 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker, test Affects Versions: 1.1.2 Reporter: Chris Nauroth Assignee: Karthik Kambatla Attachments: mr-4967.patch {{TestJvmReuse}} on branch-1 consistently fails on an assertion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4967) TestJvmReuse fails on assertion
[ https://issues.apache.org/jira/browse/MAPREDUCE-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570885#comment-13570885 ] Hadoop QA commented on MAPREDUCE-4967: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567939/mr-4967.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3300//console This message is automatically generated. TestJvmReuse fails on assertion --- Key: MAPREDUCE-4967 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4967 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker, test Affects Versions: 1.1.2 Reporter: Chris Nauroth Assignee: Karthik Kambatla Attachments: mr-4967.patch {{TestJvmReuse}} on branch-1 consistently fails on an assertion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4434) Backport MR-2779 (JobSplitWriter.java can't handle large job.split file) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated MAPREDUCE-4434: Target Version/s: 1.2.0 Affects Version/s: (was: 1.0.3) 1.1.1 Status: Patch Available (was: Open) Backport MR-2779 (JobSplitWriter.java can't handle large job.split file) to branch-1 Key: MAPREDUCE-4434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4434 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.1.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: mr-4434.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4434) Backport MR-2779 (JobSplitWriter.java can't handle large job.split file) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated MAPREDUCE-4434: Attachment: mr-4434.patch Trivial backport of MR-2779. Verfied TestJobSplitWriter passes. Backport MR-2779 (JobSplitWriter.java can't handle large job.split file) to branch-1 Key: MAPREDUCE-4434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4434 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.0.3 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: mr-4434.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4434) Backport MR-2779 (JobSplitWriter.java can't handle large job.split file) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13570890#comment-13570890 ] Hadoop QA commented on MAPREDUCE-4434: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567941/mr-4434.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3301//console This message is automatically generated. Backport MR-2779 (JobSplitWriter.java can't handle large job.split file) to branch-1 Key: MAPREDUCE-4434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4434 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.1.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: mr-4434.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2264) Job status exceeds 100% in some cases
[ https://issues.apache.org/jira/browse/MAPREDUCE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571025#comment-13571025 ] Sandy Ryza commented on MAPREDUCE-2264: --- Arun, Haven't heard from you on this. As most of the changes are unrelated to the original issue, I'll mark this as resolved and work on a cleanup JIRA tomorrow unless you say otherwise? Job status exceeds 100% in some cases -- Key: MAPREDUCE-2264 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2264 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.2, 0.20.205.0 Reporter: Adam Kramer Assignee: Devaraj K Labels: critical-0.22.0 Fix For: 1.2.0, 2.0.3-alpha Attachments: MAPREDUCE-2264-0.20.205-1.patch, MAPREDUCE-2264-0.20.205.patch, MAPREDUCE-2264-0.20.3.patch, MAPREDUCE-2264-branch-1-1.patch, MAPREDUCE-2264-branch-1-2.patch, MAPREDUCE-2264-branch-1.patch, MAPREDUCE-2264-trunk-1.patch, MAPREDUCE-2264-trunk-1.patch, MAPREDUCE-2264-trunk-2.patch, MAPREDUCE-2264-trunk-3.patch, MAPREDUCE-2264-trunk-4.patch, MAPREDUCE-2264-trunk-5.patch, MAPREDUCE-2264-trunk-5.patch, MAPREDUCE-2264-trunk.patch, more than 100%.bmp I'm looking now at my jobtracker's list of running reduce tasks. One of them is 120.05% complete, the other is 107.28% complete. I understand that these numbers are estimates, but there is no case in which an estimate of 100% for a non-complete task is better than an estimate of 99.99%, nor is there any case in which an estimate greater than 100% is valid. I suggest that whatever logic is computing these set 99.99% as a hard maximum. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4953) HadoopPipes misuses fprintf
[ https://issues.apache.org/jira/browse/MAPREDUCE-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571039#comment-13571039 ] Colin Patrick McCabe commented on MAPREDUCE-4953: - looks good to me HadoopPipes misuses fprintf --- Key: MAPREDUCE-4953 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4953 Project: Hadoop Map/Reduce Issue Type: Bug Components: pipes Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Attachments: mapreduce-4953.txt {code} [exec] /mnt/trunk/hadoop-tools/hadoop-pipes/src/main/native/pipes/impl/HadoopPipes.cc:130:58: warning: format not a string literal and no format arguments [-Wformat-security] {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2308) Sort buffer size (io.sort.mb) is limited to 2 GB
[ https://issues.apache.org/jira/browse/MAPREDUCE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571040#comment-13571040 ] kiran sreekumar commented on MAPREDUCE-2308: io.sort.mb should be 10 * io.sort.factor. HADOOP-3473 suggests to keep it default as 100. Sort buffer size (io.sort.mb) is limited to 2 GB -- Key: MAPREDUCE-2308 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2308 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1, 0.20.2, 0.21.0 Environment: Cloudera CDH3b3 (0.20.2+) Reporter: Jay Hacker Priority: Minor I have MapReduce jobs that use a large amount of per-task memory, because the algorithm I'm using converges faster if more data is together on a node. I have my JVM heap size set at 3200 MB, and if I use the popular rule of thumb that io.sort.mb should be ~70% of that, I get 2240 MB. I rounded this down to 2048 MB, but map tasks crash with : {noformat} java.io.IOException: Invalid io.sort.mb: 2048 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:790) ... {noformat} MapTask.MapOutputBuffer implements its buffer with a byte[] of size io.sort.mb (in bytes), and is sanity checking the size before allocating the array. The problem is that Java arrays can't have more than 2^31 - 1 elements (even with a 64-bit JVM), and this is a limitation of the Java language specificiation itself. As memory and data sizes grow, this would seem to be a crippling limtiation of Java. It would be nice if this ceiling were documented, and an error issued sooner, e.g. in jobtracker startup upon reading the config. Going forward, we may need to implement some array of arrays hack for large buffers. :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4822) Unnecessary conversions in History Events
[ https://issues.apache.org/jira/browse/MAPREDUCE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated MAPREDUCE-4822: Status: Open (was: Patch Available) Unnecessary conversions in History Events - Key: MAPREDUCE-4822 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4822 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Assignee: Chu Tong Priority: Trivial Labels: patch Fix For: 0.24.0 Attachments: MAPREDUCE-4822.patch There are a number of conversions in the Job History Event classes that are totally unnecessary. It appears that they were originally used to convert from the internal avro format, but now many of them do not pull the values from the avro they store them internally. For example: {code:title=TaskAttemptFinishedEvent.java} /** Get the task type */ public TaskType getTaskType() { return TaskType.valueOf(taskType.toString()); } {code} The code currently is taking an enum, converting it to a string and then asking the same enum to convert it back to an enum. If java work properly this should be a noop and a reference to the original taskType should be returned. There are several places that a string is having toString called on it, and since strings are immutable it returns a reference to itself. The various ids are not immutable and probably should not be changed at this point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4822) Unnecessary conversions in History Events
[ https://issues.apache.org/jira/browse/MAPREDUCE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated MAPREDUCE-4822: Attachment: (was: MAPREDUCE-4822.patch) Unnecessary conversions in History Events - Key: MAPREDUCE-4822 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4822 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Assignee: Chu Tong Priority: Trivial Labels: patch Fix For: 0.24.0 Attachments: MAPREDUCE-4822.patch There are a number of conversions in the Job History Event classes that are totally unnecessary. It appears that they were originally used to convert from the internal avro format, but now many of them do not pull the values from the avro they store them internally. For example: {code:title=TaskAttemptFinishedEvent.java} /** Get the task type */ public TaskType getTaskType() { return TaskType.valueOf(taskType.toString()); } {code} The code currently is taking an enum, converting it to a string and then asking the same enum to convert it back to an enum. If java work properly this should be a noop and a reference to the original taskType should be returned. There are several places that a string is having toString called on it, and since strings are immutable it returns a reference to itself. The various ids are not immutable and probably should not be changed at this point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4822) Unnecessary conversions in History Events
[ https://issues.apache.org/jira/browse/MAPREDUCE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated MAPREDUCE-4822: Attachment: MAPREDUCE-4822.patch Unnecessary conversions in History Events - Key: MAPREDUCE-4822 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4822 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Assignee: Chu Tong Priority: Trivial Labels: patch Fix For: 0.24.0 Attachments: MAPREDUCE-4822.patch There are a number of conversions in the Job History Event classes that are totally unnecessary. It appears that they were originally used to convert from the internal avro format, but now many of them do not pull the values from the avro they store them internally. For example: {code:title=TaskAttemptFinishedEvent.java} /** Get the task type */ public TaskType getTaskType() { return TaskType.valueOf(taskType.toString()); } {code} The code currently is taking an enum, converting it to a string and then asking the same enum to convert it back to an enum. If java work properly this should be a noop and a reference to the original taskType should be returned. There are several places that a string is having toString called on it, and since strings are immutable it returns a reference to itself. The various ids are not immutable and probably should not be changed at this point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4822) Unnecessary conversions in History Events
[ https://issues.apache.org/jira/browse/MAPREDUCE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated MAPREDUCE-4822: Status: Patch Available (was: Open) Unnecessary conversions in History Events - Key: MAPREDUCE-4822 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4822 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Assignee: Chu Tong Priority: Trivial Labels: patch Fix For: 0.24.0 Attachments: MAPREDUCE-4822.patch There are a number of conversions in the Job History Event classes that are totally unnecessary. It appears that they were originally used to convert from the internal avro format, but now many of them do not pull the values from the avro they store them internally. For example: {code:title=TaskAttemptFinishedEvent.java} /** Get the task type */ public TaskType getTaskType() { return TaskType.valueOf(taskType.toString()); } {code} The code currently is taking an enum, converting it to a string and then asking the same enum to convert it back to an enum. If java work properly this should be a noop and a reference to the original taskType should be returned. There are several places that a string is having toString called on it, and since strings are immutable it returns a reference to itself. The various ids are not immutable and probably should not be changed at this point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4953) HadoopPipes misuses fprintf
[ https://issues.apache.org/jira/browse/MAPREDUCE-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571045#comment-13571045 ] Aaron T. Myers commented on MAPREDUCE-4953: --- +1, the patch looks good to me. I confirmed that this gets rid of the compiler warning. I'm going to commit this momentarily. HadoopPipes misuses fprintf --- Key: MAPREDUCE-4953 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4953 Project: Hadoop Map/Reduce Issue Type: Bug Components: pipes Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Attachments: mapreduce-4953.txt {code} [exec] /mnt/trunk/hadoop-tools/hadoop-pipes/src/main/native/pipes/impl/HadoopPipes.cc:130:58: warning: format not a string literal and no format arguments [-Wformat-security] {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4953) HadoopPipes misuses fprintf
[ https://issues.apache.org/jira/browse/MAPREDUCE-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated MAPREDUCE-4953: -- Resolution: Fixed Fix Version/s: 2.0.3-alpha Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've just committed this to trunk and branch-2. Thanks a lot for the contribution, Andy. HadoopPipes misuses fprintf --- Key: MAPREDUCE-4953 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4953 Project: Hadoop Map/Reduce Issue Type: Bug Components: pipes Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Fix For: 2.0.3-alpha Attachments: mapreduce-4953.txt {code} [exec] /mnt/trunk/hadoop-tools/hadoop-pipes/src/main/native/pipes/impl/HadoopPipes.cc:130:58: warning: format not a string literal and no format arguments [-Wformat-security] {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4822) Unnecessary conversions in History Events
[ https://issues.apache.org/jira/browse/MAPREDUCE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated MAPREDUCE-4822: Status: Open (was: Patch Available) Unnecessary conversions in History Events - Key: MAPREDUCE-4822 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4822 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Assignee: Chu Tong Priority: Trivial Labels: patch Fix For: 0.24.0 Attachments: MAPREDUCE-4822.patch There are a number of conversions in the Job History Event classes that are totally unnecessary. It appears that they were originally used to convert from the internal avro format, but now many of them do not pull the values from the avro they store them internally. For example: {code:title=TaskAttemptFinishedEvent.java} /** Get the task type */ public TaskType getTaskType() { return TaskType.valueOf(taskType.toString()); } {code} The code currently is taking an enum, converting it to a string and then asking the same enum to convert it back to an enum. If java work properly this should be a noop and a reference to the original taskType should be returned. There are several places that a string is having toString called on it, and since strings are immutable it returns a reference to itself. The various ids are not immutable and probably should not be changed at this point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4953) HadoopPipes misuses fprintf
[ https://issues.apache.org/jira/browse/MAPREDUCE-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571052#comment-13571052 ] Hudson commented on MAPREDUCE-4953: --- Integrated in Hadoop-trunk-Commit #3323 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3323/]) MAPREDUCE-4953. HadoopPipes misuses fprintf. Contributed by Andy Isaacson. (Revision 1442471) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1442471 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-tools/hadoop-pipes/src/main/native/pipes/impl/HadoopPipes.cc HadoopPipes misuses fprintf --- Key: MAPREDUCE-4953 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4953 Project: Hadoop Map/Reduce Issue Type: Bug Components: pipes Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Fix For: 2.0.3-alpha Attachments: mapreduce-4953.txt {code} [exec] /mnt/trunk/hadoop-tools/hadoop-pipes/src/main/native/pipes/impl/HadoopPipes.cc:130:58: warning: format not a string literal and no format arguments [-Wformat-security] {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4822) Unnecessary conversions in History Events
[ https://issues.apache.org/jira/browse/MAPREDUCE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated MAPREDUCE-4822: Attachment: (was: MAPREDUCE-4822.patch) Unnecessary conversions in History Events - Key: MAPREDUCE-4822 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4822 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Assignee: Chu Tong Priority: Trivial Labels: patch Fix For: 0.24.0 There are a number of conversions in the Job History Event classes that are totally unnecessary. It appears that they were originally used to convert from the internal avro format, but now many of them do not pull the values from the avro they store them internally. For example: {code:title=TaskAttemptFinishedEvent.java} /** Get the task type */ public TaskType getTaskType() { return TaskType.valueOf(taskType.toString()); } {code} The code currently is taking an enum, converting it to a string and then asking the same enum to convert it back to an enum. If java work properly this should be a noop and a reference to the original taskType should be returned. There are several places that a string is having toString called on it, and since strings are immutable it returns a reference to itself. The various ids are not immutable and probably should not be changed at this point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4822) Unnecessary conversions in History Events
[ https://issues.apache.org/jira/browse/MAPREDUCE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated MAPREDUCE-4822: Attachment: MAPREDUCE-4822.patch No testcase is included as the change is trivial. For JavaDoc warnings, it is false positive as the same number of warnings are generated on a clean build under my dev environment. -1 overall. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated 20 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Unnecessary conversions in History Events - Key: MAPREDUCE-4822 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4822 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Assignee: Chu Tong Priority: Trivial Labels: patch Fix For: 0.24.0 Attachments: MAPREDUCE-4822.patch There are a number of conversions in the Job History Event classes that are totally unnecessary. It appears that they were originally used to convert from the internal avro format, but now many of them do not pull the values from the avro they store them internally. For example: {code:title=TaskAttemptFinishedEvent.java} /** Get the task type */ public TaskType getTaskType() { return TaskType.valueOf(taskType.toString()); } {code} The code currently is taking an enum, converting it to a string and then asking the same enum to convert it back to an enum. If java work properly this should be a noop and a reference to the original taskType should be returned. There are several places that a string is having toString called on it, and since strings are immutable it returns a reference to itself. The various ids are not immutable and probably should not be changed at this point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4822) Unnecessary conversions in History Events
[ https://issues.apache.org/jira/browse/MAPREDUCE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated MAPREDUCE-4822: Status: Patch Available (was: Open) Unnecessary conversions in History Events - Key: MAPREDUCE-4822 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4822 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Assignee: Chu Tong Priority: Trivial Labels: patch Fix For: 0.24.0 Attachments: MAPREDUCE-4822.patch There are a number of conversions in the Job History Event classes that are totally unnecessary. It appears that they were originally used to convert from the internal avro format, but now many of them do not pull the values from the avro they store them internally. For example: {code:title=TaskAttemptFinishedEvent.java} /** Get the task type */ public TaskType getTaskType() { return TaskType.valueOf(taskType.toString()); } {code} The code currently is taking an enum, converting it to a string and then asking the same enum to convert it back to an enum. If java work properly this should be a noop and a reference to the original taskType should be returned. There are several places that a string is having toString called on it, and since strings are immutable it returns a reference to itself. The various ids are not immutable and probably should not be changed at this point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4822) Unnecessary conversions in History Events
[ https://issues.apache.org/jira/browse/MAPREDUCE-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571122#comment-13571122 ] Hadoop QA commented on MAPREDUCE-4822: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567961/MAPREDUCE-4822.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3302//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3302//console This message is automatically generated. Unnecessary conversions in History Events - Key: MAPREDUCE-4822 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4822 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Assignee: Chu Tong Priority: Trivial Labels: patch Fix For: 0.24.0 Attachments: MAPREDUCE-4822.patch There are a number of conversions in the Job History Event classes that are totally unnecessary. It appears that they were originally used to convert from the internal avro format, but now many of them do not pull the values from the avro they store them internally. For example: {code:title=TaskAttemptFinishedEvent.java} /** Get the task type */ public TaskType getTaskType() { return TaskType.valueOf(taskType.toString()); } {code} The code currently is taking an enum, converting it to a string and then asking the same enum to convert it back to an enum. If java work properly this should be a noop and a reference to the original taskType should be returned. There are several places that a string is having toString called on it, and since strings are immutable it returns a reference to itself. The various ids are not immutable and probably should not be changed at this point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571123#comment-13571123 ] Gelesh commented on MAPREDUCE-4974: --- [~tlipcon] I tried out an estimation,on Local, with small data, subtracting the the long value obtained from System.nanoTime() at the beginning and at the end of the method. Average time difference was 200 Nano Seconds per each anomic call made to nextKeyValue(), excluding the very first call, since it involves the object creation. The total time difference would be 200 * number of Key Value pairs generated per each Map Task. Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.20.204.0, 0.24.0 Attachments: MAPREDUCE-4974.1.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avner BenHanoch updated MAPREDUCE-4049: --- Attachment: MAPREDUCE-4049--branch-1.patch plugin for generic shuffle service -- Key: MAPREDUCE-4049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: performance, task, tasktracker Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0 Reporter: Avner BenHanoch Assignee: Avner BenHanoch Labels: merge, plugin, rdma, shuffle Fix For: 2.0.3-alpha Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, MAPREDUCE-4049--branch-1.patch, MAPREDUCE-4049--branch-1.patch, MAPREDUCE-4049--branch-1.patch, mapreduce-4049.patch Support generic shuffle service as set of two plugins: ShuffleProvider ShuffleConsumer. This will satisfy the following needs: # Better shuffle and merge performance. For example: we are working on shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable merge approach during the intermediate merges. Hence, getting much better performance. # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0). References: # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] # I am attaching 2 documents with suggested Top Level Design for both plugins (currently, based on 1.0 branch) # I am providing link for downloading UDA - Mellanox's open source plugin that implements generic shuffle service using RDMA and levitated merge. Note: At this phase, the code is in C++ through JNI and you should consider it as beta only. Still, it can serve anyone that wants to implement or contribute to levitated merge. (Please be advised that levitated merge is mostly suit in very fast networks) - [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571125#comment-13571125 ] Avner BenHanoch commented on MAPREDUCE-4049: Arun, Alejadro, I attached new patch for branch-1 that addresses all your comments. I am still passing ReduceTask to the plugin according to my explanation in the previous comment. Cheers, Avner plugin for generic shuffle service -- Key: MAPREDUCE-4049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: performance, task, tasktracker Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0 Reporter: Avner BenHanoch Assignee: Avner BenHanoch Labels: merge, plugin, rdma, shuffle Fix For: 2.0.3-alpha Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, MAPREDUCE-4049--branch-1.patch, MAPREDUCE-4049--branch-1.patch, MAPREDUCE-4049--branch-1.patch, mapreduce-4049.patch Support generic shuffle service as set of two plugins: ShuffleProvider ShuffleConsumer. This will satisfy the following needs: # Better shuffle and merge performance. For example: we are working on shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable merge approach during the intermediate merges. Hence, getting much better performance. # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0). References: # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] # I am attaching 2 documents with suggested Top Level Design for both plugins (currently, based on 1.0 branch) # I am providing link for downloading UDA - Mellanox's open source plugin that implements generic shuffle service using RDMA and levitated merge. Note: At this phase, the code is in C++ through JNI and you should consider it as beta only. Still, it can serve anyone that wants to implement or contribute to levitated merge. (Please be advised that levitated merge is mostly suit in very fast networks) - [http://www.mellanox.com/content/pages.php?pg=products_dynproduct_family=144menu_section=69] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira