[jira] [Resolved] (MAPREDUCE-5994) native-task: TestBytesUtil fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved MAPREDUCE-5994. Resolution: Fixed Hadoop Flags: Reviewed Committed to branch, thanks Sean. native-task: TestBytesUtil fails Key: MAPREDUCE-5994 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5994 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: mapreduce-5994.txt This class appears to have some bugs. Two tests fail consistently on my system. BytesUtil itself appears to duplicate a lot of code from guava - we should probably just use the Guava functions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (MAPREDUCE-5996) native-task: Rename system tests into standard directory layout
[ https://issues.apache.org/jira/browse/MAPREDUCE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved MAPREDUCE-5996. Resolution: Fixed Hadoop Flags: Reviewed Committed to branch. Thanks for reviewing, Binglin. native-task: Rename system tests into standard directory layout --- Key: MAPREDUCE-5996 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5996 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: mapreduce-5996.txt Currently there are a number of tests in src/java/system. This confuses IDEs which think that the package should then be system.org.apache.hadoop instead of just org.apache.hadoop. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-6002) MR task should prevent report error to AM when process is shutting down
[ https://issues.apache.org/jira/browse/MAPREDUCE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated MAPREDUCE-6002: -- Status: Patch Available (was: Open) MR task should prevent report error to AM when process is shutting down --- Key: MAPREDUCE-6002 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Affects Versions: 2.5.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: MR-6002.patch With MAPREDUCE-5900, preempted MR task should not be treat as failed. But it is still possible a MR task fail and report to AM when preemption take effect and the AM hasn't received completed container from RM yet. It will cause the task attempt marked failed instead of preempted. An example is FileSystem has shutdown hook, it will close all FileSystem instance, if at the same time, the FileSystem is in-use (like reading split details from HDFS), MR task will fail and report the fatal error to MR AM. An exception will be raised: {code} 2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1405985051088_0018_m_25_0 - exited : java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645) at java.io.DataInputStream.readByte(DataInputStream.java:265) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) at org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348) at org.apache.hadoop.io.Text.readString(Text.java:464) at org.apache.hadoop.io.Text.readString(Text.java:457) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) {code} We should prevent this, because it is possible other exceptions happen when shutting down, we shouldn't report any of such exceptions to AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-6002) MR task should prevent report error to AM when process is shutting down
[ https://issues.apache.org/jira/browse/MAPREDUCE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated MAPREDUCE-6002: -- Attachment: MR-6002.patch Attached a patch for review. MR task should prevent report error to AM when process is shutting down --- Key: MAPREDUCE-6002 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Affects Versions: 2.5.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: MR-6002.patch With MAPREDUCE-5900, preempted MR task should not be treat as failed. But it is still possible a MR task fail and report to AM when preemption take effect and the AM hasn't received completed container from RM yet. It will cause the task attempt marked failed instead of preempted. An example is FileSystem has shutdown hook, it will close all FileSystem instance, if at the same time, the FileSystem is in-use (like reading split details from HDFS), MR task will fail and report the fatal error to MR AM. An exception will be raised: {code} 2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1405985051088_0018_m_25_0 - exited : java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645) at java.io.DataInputStream.readByte(DataInputStream.java:265) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) at org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348) at org.apache.hadoop.io.Text.readString(Text.java:464) at org.apache.hadoop.io.Text.readString(Text.java:457) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) {code} We should prevent this, because it is possible other exceptions happen when shutting down, we shouldn't report any of such exceptions to AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-1380) Adaptive Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anonymous updated MAPREDUCE-1380: - Target Version/s: 2.4.1 Affects Version/s: 2.4.1 Hadoop Flags: Incompatible change,Reviewed Status: Patch Available (was: Reopened) Adaptive Scheduler -- Key: MAPREDUCE-1380 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.4.1 Reporter: Jordà Polo Priority: Minor Attachments: MAPREDUCE-1380-branch-1.2.patch, MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch, MAPREDUCE-1380_1.1.pdf The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically adjusts the amount of used resources depending on the performance of jobs and on user-defined high-level business goals. Existing Hadoop schedulers are focused on managing large, static clusters in which nodes are added or removed manually. On the other hand, the goal of this scheduler is to improve the integration of Hadoop and the applications that run on top of it with environments that allow a more dynamic provisioning of resources. The current implementation is quite straightforward. Users specify a deadline at job submission time, and the scheduler adjusts the resources to meet that deadline (at the moment, the scheduler can be configured to either minimize or maximize the amount of resources). If multiple jobs are run simultaneously, the scheduler prioritizes them by deadline. Note that the current approach to estimate the completion time of jobs is quite simplistic: it is based on the time it takes to finish each task, so it works well with regular jobs, but there is still room for improvement for unpredictable jobs. The idea is to further integrate it with cloud-like and virtual environments (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't able to meet its deadline, the scheduler automatically requests more resources. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5997) native-task: Use DirectBufferPool from Hadoop Common
[ https://issues.apache.org/jira/browse/MAPREDUCE-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-5997: --- Attachment: mapreduce-5997.txt Attached patch switches over to using the implementation in common. This implementation doesn't provide a singleton, so I changed it to use a static instance inside InputBuffer. In order to do that, I also changed the returnBuffer call to happen in a new InputBuffer.close() method, and made InputBuffer implement Closeable. That'll make sure that findbugs properly warns if anyone uses InputBuffers without calling close() native-task: Use DirectBufferPool from Hadoop Common Key: MAPREDUCE-5997 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5997 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: mapreduce-5997.txt The native task code has its own direct buffer pool, but Hadoop already has an implementation. HADOOP-10882 will move that implementation into Common, and this JIRA is to remove the duplicate code and use that one instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5997) native-task: Use DirectBufferPool from Hadoop Common
[ https://issues.apache.org/jira/browse/MAPREDUCE-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072900#comment-14072900 ] Sean Zhong commented on MAPREDUCE-5997: --- looks good to me, +1 native-task: Use DirectBufferPool from Hadoop Common Key: MAPREDUCE-5997 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5997 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: mapreduce-5997.txt The native task code has its own direct buffer pool, but Hadoop already has an implementation. HADOOP-10882 will move that implementation into Common, and this JIRA is to remove the duplicate code and use that one instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (MAPREDUCE-5991) native-task should not run unit tests if native profile is not enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang reassigned MAPREDUCE-5991: Assignee: Binglin Chang native-task should not run unit tests if native profile is not enabled -- Key: MAPREDUCE-5991 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5991 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Todd Lipcon Assignee: Binglin Chang Currently, running mvn test without the 'native' profile enabled causes all of the native-task tests to fail. In order to integrate to trunk, we need to fix this - either using JUnit Assume commands in each test that depends on native code, or disabling the tests from the pom unless -Pnative is specified -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5995) native-task: revert changes which expose Text internals
[ https://issues.apache.org/jira/browse/MAPREDUCE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-5995: --- Attachment: mapreduce-5995.txt Actually, just rebased the patch onto the tip of branch instead. Hopefully this one will apply properly for you, Manu. native-task: revert changes which expose Text internals --- Key: MAPREDUCE-5995 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5995 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: mapreduce-5995.txt, mapreduce-5995.txt The current branch has some changes to the Text writable which allow it to manually set the backing array, capacity, etc. Rather than exposing these internals, we should use the newly-committed facility from HADOOP-10855 to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (MAPREDUCE-883) harchive: Document how to unarchive
[ https://issues.apache.org/jira/browse/MAPREDUCE-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA reassigned MAPREDUCE-883: --- Assignee: Akira AJISAKA harchive: Document how to unarchive --- Key: MAPREDUCE-883 URL: https://issues.apache.org/jira/browse/MAPREDUCE-883 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation, harchive Reporter: Koji Noguchi Assignee: Akira AJISAKA Priority: Minor Attachments: mapreduce-883-0.patch I was thinking of implementing harchive's 'unarchive' feature, but realized it has been implemented already ever since harchive was introduced. It just needs to be documented. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5991) native-task should not run unit tests if native profile is not enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated MAPREDUCE-5991: - Attachment: MAPREDUCE-5991.v1.patch Changes: 1. add hadoop-mapreduce-client-common test jar, so to remove additionalClasspathElements 2. remove system tests in default profile, add them in native profile now just run mvn test succeed. note that mvn test -Pnative is still failing native-task should not run unit tests if native profile is not enabled -- Key: MAPREDUCE-5991 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5991 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Todd Lipcon Assignee: Binglin Chang Attachments: MAPREDUCE-5991.v1.patch Currently, running mvn test without the 'native' profile enabled causes all of the native-task tests to fail. In order to integrate to trunk, we need to fix this - either using JUnit Assume commands in each test that depends on native code, or disabling the tests from the pom unless -Pnative is specified -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-883) harchive: Document how to unarchive
[ https://issues.apache.org/jira/browse/MAPREDUCE-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated MAPREDUCE-883: Labels: newbie (was: ) Target Version/s: 2.6.0 Status: Patch Available (was: Open) harchive: Document how to unarchive --- Key: MAPREDUCE-883 URL: https://issues.apache.org/jira/browse/MAPREDUCE-883 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation, harchive Reporter: Koji Noguchi Assignee: Akira AJISAKA Priority: Minor Labels: newbie Attachments: MAPREDUCE-883.1.patch, mapreduce-883-0.patch I was thinking of implementing harchive's 'unarchive' feature, but realized it has been implemented already ever since harchive was introduced. It just needs to be documented. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-883) harchive: Document how to unarchive
[ https://issues.apache.org/jira/browse/MAPREDUCE-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated MAPREDUCE-883: Attachment: MAPREDUCE-883.1.patch Rebased for the latest trunk. harchive: Document how to unarchive --- Key: MAPREDUCE-883 URL: https://issues.apache.org/jira/browse/MAPREDUCE-883 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation, harchive Reporter: Koji Noguchi Assignee: Akira AJISAKA Priority: Minor Labels: newbie Attachments: MAPREDUCE-883.1.patch, mapreduce-883-0.patch I was thinking of implementing harchive's 'unarchive' feature, but realized it has been implemented already ever since harchive was introduced. It just needs to be documented. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-1380) Adaptive Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072945#comment-14072945 ] Hadoop QA commented on MAPREDUCE-1380: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12630631/MAPREDUCE-1380-branch-1.2.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4765//console This message is automatically generated. Adaptive Scheduler -- Key: MAPREDUCE-1380 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.4.1 Reporter: Jordà Polo Priority: Minor Attachments: MAPREDUCE-1380-branch-1.2.patch, MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch, MAPREDUCE-1380_1.1.pdf The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically adjusts the amount of used resources depending on the performance of jobs and on user-defined high-level business goals. Existing Hadoop schedulers are focused on managing large, static clusters in which nodes are added or removed manually. On the other hand, the goal of this scheduler is to improve the integration of Hadoop and the applications that run on top of it with environments that allow a more dynamic provisioning of resources. The current implementation is quite straightforward. Users specify a deadline at job submission time, and the scheduler adjusts the resources to meet that deadline (at the moment, the scheduler can be configured to either minimize or maximize the amount of resources). If multiple jobs are run simultaneously, the scheduler prioritizes them by deadline. Note that the current approach to estimate the completion time of jobs is quite simplistic: it is based on the time it takes to finish each task, so it works well with regular jobs, but there is still room for improvement for unpredictable jobs. The idea is to further integrate it with cloud-like and virtual environments (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't able to meet its deadline, the scheduler automatically requests more resources. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6002) MR task should prevent report error to AM when process is shutting down
[ https://issues.apache.org/jira/browse/MAPREDUCE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072957#comment-14072957 ] Hadoop QA commented on MAPREDUCE-6002: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657552/MR-6002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4764//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4764//console This message is automatically generated. MR task should prevent report error to AM when process is shutting down --- Key: MAPREDUCE-6002 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Affects Versions: 2.5.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: MR-6002.patch With MAPREDUCE-5900, preempted MR task should not be treat as failed. But it is still possible a MR task fail and report to AM when preemption take effect and the AM hasn't received completed container from RM yet. It will cause the task attempt marked failed instead of preempted. An example is FileSystem has shutdown hook, it will close all FileSystem instance, if at the same time, the FileSystem is in-use (like reading split details from HDFS), MR task will fail and report the fatal error to MR AM. An exception will be raised: {code} 2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1405985051088_0018_m_25_0 - exited : java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645) at java.io.DataInputStream.readByte(DataInputStream.java:265) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) at org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348) at org.apache.hadoop.io.Text.readString(Text.java:464) at org.apache.hadoop.io.Text.readString(Text.java:457) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) {code} We should prevent this, because it is possible other exceptions happen when shutting down, we shouldn't report any of such exceptions to AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-883) harchive: Document how to unarchive
[ https://issues.apache.org/jira/browse/MAPREDUCE-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072973#comment-14072973 ] Hadoop QA commented on MAPREDUCE-883: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657567/MAPREDUCE-883.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4766//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4766//console This message is automatically generated. harchive: Document how to unarchive --- Key: MAPREDUCE-883 URL: https://issues.apache.org/jira/browse/MAPREDUCE-883 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation, harchive Reporter: Koji Noguchi Assignee: Akira AJISAKA Priority: Minor Labels: newbie Attachments: MAPREDUCE-883.1.patch, mapreduce-883-0.patch I was thinking of implementing harchive's 'unarchive' feature, but realized it has been implemented already ever since harchive was introduced. It just needs to be documented. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-6000) native-task: Simplify ByteBufferDataReader/Writer
[ https://issues.apache.org/jira/browse/MAPREDUCE-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-6000: --- Attachment: mapreduce-6000.txt Uploading new rev, per above. native-task: Simplify ByteBufferDataReader/Writer - Key: MAPREDUCE-6000 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6000 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: mapreduce-6000.txt, mapreduce-6000.txt The ByteBufferDataReader and ByteBufferDataWriter class are more complex than necessary: - several methods related to reading/writing strings and char arrays are implemented but never used by the native task code. Given that the use case for these classes is limited to serializing binary data to/from the native code, it seems unlikely people will want to use these methods in any performance-critical space. So, let's do simpler implementations that are less likely to be buggy, even if they're slightly less performant. - methods like readLine() are even less likely to be used. Since it's a complex implementation, let's just throw UnsupportedOperationException - in the test case, we can use Mockito to shorten the amount of new code -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6000) native-task: Simplify ByteBufferDataReader/Writer
[ https://issues.apache.org/jira/browse/MAPREDUCE-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072976#comment-14072976 ] Todd Lipcon commented on MAPREDUCE-6000: I chatted offline with Sean a bit about this one, and we realized there are a couple issues: - he made the point that, even if we don't internally use all of the methods, users might expect a DataOutput/DataInput implementation to be fully functional. - I hadn't realized that DataInput/DataOutput use modified UTF-8, and not actually UTF-8. I added a new test case with a unicode cat face and found that my new implementation was incompatible with Java's implementation, since I was using real UTF8 and not modified. But, we still need to do something like this patch, since the current code actually appears to copy-paste stuff from the Oracle JDK, and thus isn't Apache-license-compatible. So, I'll upload a new rev of the patch with the following changes since the previous one: - add a test case with unicode cat face (it's outside the basic multilingual plane so tends to expose bugs like the above) - instead of trying to implement the more complex methods, the reader/writer classes now have an internal java.io.Data{Input,Output}Stream instance. We delegate to these for any of the more substantial methods, so that we don't have to duplicate any encoding/decoding code. This is safe since those classes don't do any of their own buffering. native-task: Simplify ByteBufferDataReader/Writer - Key: MAPREDUCE-6000 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6000 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: mapreduce-6000.txt, mapreduce-6000.txt The ByteBufferDataReader and ByteBufferDataWriter class are more complex than necessary: - several methods related to reading/writing strings and char arrays are implemented but never used by the native task code. Given that the use case for these classes is limited to serializing binary data to/from the native code, it seems unlikely people will want to use these methods in any performance-critical space. So, let's do simpler implementations that are less likely to be buggy, even if they're slightly less performant. - methods like readLine() are even less likely to be used. Since it's a complex implementation, let's just throw UnsupportedOperationException - in the test case, we can use Mockito to shorten the amount of new code -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (MAPREDUCE-5997) native-task: Use DirectBufferPool from Hadoop Common
[ https://issues.apache.org/jira/browse/MAPREDUCE-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved MAPREDUCE-5997. Resolution: Fixed Hadoop Flags: Reviewed Committed to branch. Thanks for reviewing. native-task: Use DirectBufferPool from Hadoop Common Key: MAPREDUCE-5997 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5997 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: mapreduce-5997.txt The native task code has its own direct buffer pool, but Hadoop already has an implementation. HADOOP-10882 will move that implementation into Common, and this JIRA is to remove the duplicate code and use that one instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6000) native-task: Simplify ByteBufferDataReader/Writer
[ https://issues.apache.org/jira/browse/MAPREDUCE-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072981#comment-14072981 ] Sean Zhong commented on MAPREDUCE-6000: --- looks great, +1 native-task: Simplify ByteBufferDataReader/Writer - Key: MAPREDUCE-6000 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6000 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: mapreduce-6000.txt, mapreduce-6000.txt The ByteBufferDataReader and ByteBufferDataWriter class are more complex than necessary: - several methods related to reading/writing strings and char arrays are implemented but never used by the native task code. Given that the use case for these classes is limited to serializing binary data to/from the native code, it seems unlikely people will want to use these methods in any performance-critical space. So, let's do simpler implementations that are less likely to be buggy, even if they're slightly less performant. - methods like readLine() are even less likely to be used. Since it's a complex implementation, let's just throw UnsupportedOperationException - in the test case, we can use Mockito to shorten the amount of new code -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (MAPREDUCE-5992) native-task test logs should not write to console
[ https://issues.apache.org/jira/browse/MAPREDUCE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reassigned MAPREDUCE-5992: -- Assignee: (was: Todd Lipcon) native-task test logs should not write to console - Key: MAPREDUCE-5992 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5992 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Todd Lipcon Most of our unit tests are configured with a log4j.properties test resource so they don't spout a bunch of output to the console. We need to do the same for native-task. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (MAPREDUCE-6000) native-task: Simplify ByteBufferDataReader/Writer
[ https://issues.apache.org/jira/browse/MAPREDUCE-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved MAPREDUCE-6000. Resolution: Fixed Hadoop Flags: Reviewed Thanks for the prompt review. Committed to branch. native-task: Simplify ByteBufferDataReader/Writer - Key: MAPREDUCE-6000 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6000 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: mapreduce-6000.txt, mapreduce-6000.txt The ByteBufferDataReader and ByteBufferDataWriter class are more complex than necessary: - several methods related to reading/writing strings and char arrays are implemented but never used by the native task code. Given that the use case for these classes is limited to serializing binary data to/from the native code, it seems unlikely people will want to use these methods in any performance-critical space. So, let's do simpler implementations that are less likely to be buggy, even if they're slightly less performant. - methods like readLine() are even less likely to be used. Since it's a complex implementation, let's just throw UnsupportedOperationException - in the test case, we can use Mockito to shorten the amount of new code -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5987) native-task: Unit test TestGlibCBug fails on ubuntu
[ https://issues.apache.org/jira/browse/MAPREDUCE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072988#comment-14072988 ] Todd Lipcon commented on MAPREDUCE-5987: Can you give some more context on the bug this is referring to? I found the description in the RHEL release notes: {quote} Prior to this update, the internal FILE offset was set incorrectly in wide character streams. As a consequence, the offset returned by ftell was incorrect. In some cases, this could result in over-writing data. This update modifies the ftell code to correctly set the internal FILE offset field for wide characters. Now, ftell and fseek handle the offset as expected. {quote} but best I can tell, the nativetask code never calls ftell() or fseek(). Running 'nm' on the libnativetask.so confirms that they aren't linked in. native-task: Unit test TestGlibCBug fails on ubuntu --- Key: MAPREDUCE-5987 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5987 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Sean Zhong Assignee: Sean Zhong Priority: Minor On ubuntu12, glibc: 2.15-0ubuntu10.3, UT TestGlibCBug fails [ RUN ] IFile.TestGlibCBug 14/07/21 15:55:30 INFO TestGlibCBug ./testData/testGlibCBugSpill.out /home/decster/projects/hadoop-trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test/TestIFile.cc:186: Failure Value of: realKey Actual: 1127504685 Expected: expect[index] Which is: 4102672832 [ FAILED ] IFile.TestGlibCBug (0 ms) [--] 2 tests from IFile (240 ms total) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5991) native-task should not run unit tests if native profile is not enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072991#comment-14072991 ] Todd Lipcon commented on MAPREDUCE-5991: +1, looks good to me. Feel free to commit! native-task should not run unit tests if native profile is not enabled -- Key: MAPREDUCE-5991 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5991 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Todd Lipcon Assignee: Binglin Chang Attachments: MAPREDUCE-5991.v1.patch Currently, running mvn test without the 'native' profile enabled causes all of the native-task tests to fail. In order to integrate to trunk, we need to fix this - either using JUnit Assume commands in each test that depends on native code, or disabling the tests from the pom unless -Pnative is specified -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5987) native-task: Unit test TestGlibCBug fails on ubuntu
[ https://issues.apache.org/jira/browse/MAPREDUCE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073003#comment-14073003 ] Sean Zhong commented on MAPREDUCE-5987: --- The steps to reproduce this bug: 1. allocate a small direct buffer, like 10 bytes 2. prepare a large data set in java side, suppose 1MB. And make the source data a incremental sequence. 3. write the data, it will first try to fill direct buffer, when it is full, it will notify native side to fetch the data, over and over. 4. In native side, check the flushed data, and make sure there are also sequential. Ocassionally, one data element data is corrupted. 5. The test can only be reproduced when direct buffer size is extremely small. After the Glibc update to https://rhn.redhat.com/errata/RHBA-2013-0279.html, this no longer happens. native-task: Unit test TestGlibCBug fails on ubuntu --- Key: MAPREDUCE-5987 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5987 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Sean Zhong Assignee: Sean Zhong Priority: Minor On ubuntu12, glibc: 2.15-0ubuntu10.3, UT TestGlibCBug fails [ RUN ] IFile.TestGlibCBug 14/07/21 15:55:30 INFO TestGlibCBug ./testData/testGlibCBugSpill.out /home/decster/projects/hadoop-trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test/TestIFile.cc:186: Failure Value of: realKey Actual: 1127504685 Expected: expect[index] Which is: 4102672832 [ FAILED ] IFile.TestGlibCBug (0 ms) [--] 2 tests from IFile (240 ms total) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (MAPREDUCE-5999) Fix dead link in InputFormat javadoc
[ https://issues.apache.org/jira/browse/MAPREDUCE-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA reassigned MAPREDUCE-5999: Assignee: Akira AJISAKA Fix dead link in InputFormat javadoc Key: MAPREDUCE-5999 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5999 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.0.2-alpha Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie In [InputFormat|http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapred/InputFormat.html] javadoc, there is a dead link 'mapreduce.input.fileinputformat.split.minsize'. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5999) Fix dead link in InputFormat javadoc
[ https://issues.apache.org/jira/browse/MAPREDUCE-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated MAPREDUCE-5999: - Target Version/s: 2.6.0 Status: Patch Available (was: Open) Fix dead link in InputFormat javadoc Key: MAPREDUCE-5999 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5999 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.0.2-alpha Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Attachments: MAPREDUCE-5999.patch In [InputFormat|http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapred/InputFormat.html] javadoc, there is a dead link 'mapreduce.input.fileinputformat.split.minsize'. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5999) Fix dead link in InputFormat javadoc
[ https://issues.apache.org/jira/browse/MAPREDUCE-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated MAPREDUCE-5999: - Attachment: MAPREDUCE-5999.patch Attaching a patch. Fix dead link in InputFormat javadoc Key: MAPREDUCE-5999 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5999 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.0.2-alpha Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Attachments: MAPREDUCE-5999.patch In [InputFormat|http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapred/InputFormat.html] javadoc, there is a dead link 'mapreduce.input.fileinputformat.split.minsize'. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5995) native-task: revert changes which expose Text internals
[ https://issues.apache.org/jira/browse/MAPREDUCE-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073023#comment-14073023 ] Manu Zhang commented on MAPREDUCE-5995: --- looks good and passed kv test for Text at my side. +1 native-task: revert changes which expose Text internals --- Key: MAPREDUCE-5995 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5995 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: mapreduce-5995.txt, mapreduce-5995.txt The current branch has some changes to the Text writable which allow it to manually set the backing array, capacity, etc. Rather than exposing these internals, we should use the newly-committed facility from HADOOP-10855 to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5999) Fix dead link in InputFormat javadoc
[ https://issues.apache.org/jira/browse/MAPREDUCE-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073039#comment-14073039 ] Hadoop QA commented on MAPREDUCE-5999: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657581/MAPREDUCE-5999.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4767//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4767//console This message is automatically generated. Fix dead link in InputFormat javadoc Key: MAPREDUCE-5999 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5999 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.0.2-alpha Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Attachments: MAPREDUCE-5999.patch In [InputFormat|http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapred/InputFormat.html] javadoc, there is a dead link 'mapreduce.input.fileinputformat.split.minsize'. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5987) native-task: Unit test TestGlibCBug fails on ubuntu
[ https://issues.apache.org/jira/browse/MAPREDUCE-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073046#comment-14073046 ] Binglin Chang commented on MAPREDUCE-5987: -- Hi Sean, I don't know why the steps in your comments is related to the test code? The test code just read from a file sequentially and check the data is not corrupted. native-task: Unit test TestGlibCBug fails on ubuntu --- Key: MAPREDUCE-5987 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5987 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Sean Zhong Assignee: Sean Zhong Priority: Minor On ubuntu12, glibc: 2.15-0ubuntu10.3, UT TestGlibCBug fails [ RUN ] IFile.TestGlibCBug 14/07/21 15:55:30 INFO TestGlibCBug ./testData/testGlibCBugSpill.out /home/decster/projects/hadoop-trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test/TestIFile.cc:186: Failure Value of: realKey Actual: 1127504685 Expected: expect[index] Which is: 4102672832 [ FAILED ] IFile.TestGlibCBug (0 ms) [--] 2 tests from IFile (240 ms total) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (MAPREDUCE-5991) native-task should not run unit tests if native profile is not enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang resolved MAPREDUCE-5991. -- Resolution: Fixed Committed to branch. Thanks Todd. native-task should not run unit tests if native profile is not enabled -- Key: MAPREDUCE-5991 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5991 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Todd Lipcon Assignee: Binglin Chang Attachments: MAPREDUCE-5991.v1.patch Currently, running mvn test without the 'native' profile enabled causes all of the native-task tests to fail. In order to integrate to trunk, we need to fix this - either using JUnit Assume commands in each test that depends on native code, or disabling the tests from the pom unless -Pnative is specified -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-6003) Resource Estimator suggests huge map output in some cases
Chengbing Liu created MAPREDUCE-6003: Summary: Resource Estimator suggests huge map output in some cases Key: MAPREDUCE-6003 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6003 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 1.2.1 Reporter: Chengbing Liu In some cases, ResourceEstimator can return way too large map output estimation. This happens when input size is not correctly calculated. A typical case is when joining two Hive tables (one in HDFS and the other in HBase). The maps that process the HBase table finish first, which has a 0 length of inputs due to its TableInputFormat. Then for a map that processes HDFS table, the estimated output size is very large because of the wrong input size, causing the map task not possible to be assigned. There are two possible solutions to this problem: (1) Make input size correct for each case, e.g. HBase, etc. (2) Use another algorithm to estimate the map output, or at least make it closer to reality. I prefer the second way, since the first would require all possibilities to be taken care of. It is not easy for some inputs such as URIs. In my opinion, we could make a second estimation which is independent of the input size: estimationB = (completedMapOutputSize / completedMaps) * totalMaps * 10 Here, multiplying by 10 makes the estimation more conservative, so that it will be less likely to assign it to some where not big enough. The former estimation goes like this: estimationA = (inputSize * completedMapOutputSize * 2.0) / completedMapInputSize My suggestion is to take minimum of the two estimations: estimation = min(estimationA, estimationB) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5976) native-task should not fail to build if snappy is missing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manu Zhang updated MAPREDUCE-5976: -- Attachment: mapreduce-5976-v2.txt native-task should not fail to build if snappy is missing - Key: MAPREDUCE-5976 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5976 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Todd Lipcon Assignee: Sean Zhong Attachments: mapreduce-5976-v2.txt, mapreduce-5976.txt Other native parts of Hadoop will automatically disable snappy support if snappy is not present and -Drequire.snappy is not passed. native-task should do the same. (right now, it fails to build if snappy is missing) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-883) harchive: Document how to unarchive
[ https://issues.apache.org/jira/browse/MAPREDUCE-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073222#comment-14073222 ] Allen Wittenauer commented on MAPREDUCE-883: +1 harchive: Document how to unarchive --- Key: MAPREDUCE-883 URL: https://issues.apache.org/jira/browse/MAPREDUCE-883 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation, harchive Reporter: Koji Noguchi Assignee: Akira AJISAKA Priority: Minor Labels: newbie Attachments: MAPREDUCE-883.1.patch, mapreduce-883-0.patch I was thinking of implementing harchive's 'unarchive' feature, but realized it has been implemented already ever since harchive was introduced. It just needs to be documented. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5976) native-task should not fail to build if snappy is missing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073229#comment-14073229 ] Sean Zhong commented on MAPREDUCE-5976: --- Thanks, Manu. Looks good, +1 Changes of the new patch: 1. use system provided snappy header files, remove builtin snappy header 2. Java side delegate the codec check to a native function, NativeRuntime.supportCompressionCodec(codecName : String) native-task should not fail to build if snappy is missing - Key: MAPREDUCE-5976 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5976 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Todd Lipcon Assignee: Sean Zhong Attachments: mapreduce-5976-v2.txt, mapreduce-5976.txt Other native parts of Hadoop will automatically disable snappy support if snappy is not present and -Drequire.snappy is not passed. native-task should do the same. (right now, it fails to build if snappy is missing) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5976) native-task should not fail to build if snappy is missing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073231#comment-14073231 ] Manu Zhang commented on MAPREDUCE-5976: --- patch updated. Changes include 1. checks for all compression codecs in the java side whether they have native support built in. 2. remove snappy header files and include system snappy library in native code native-task should not fail to build if snappy is missing - Key: MAPREDUCE-5976 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5976 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Todd Lipcon Assignee: Sean Zhong Attachments: mapreduce-5976-v2.txt, mapreduce-5976.txt Other native parts of Hadoop will automatically disable snappy support if snappy is not present and -Drequire.snappy is not passed. native-task should do the same. (right now, it fails to build if snappy is missing) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-1380) Adaptive Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated MAPREDUCE-1380: --- Hadoop Flags: (was: Incompatible change,Reviewed) Adaptive Scheduler -- Key: MAPREDUCE-1380 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.4.1 Reporter: Jordà Polo Priority: Minor Attachments: MAPREDUCE-1380-branch-1.2.patch, MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch, MAPREDUCE-1380_1.1.pdf The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically adjusts the amount of used resources depending on the performance of jobs and on user-defined high-level business goals. Existing Hadoop schedulers are focused on managing large, static clusters in which nodes are added or removed manually. On the other hand, the goal of this scheduler is to improve the integration of Hadoop and the applications that run on top of it with environments that allow a more dynamic provisioning of resources. The current implementation is quite straightforward. Users specify a deadline at job submission time, and the scheduler adjusts the resources to meet that deadline (at the moment, the scheduler can be configured to either minimize or maximize the amount of resources). If multiple jobs are run simultaneously, the scheduler prioritizes them by deadline. Note that the current approach to estimate the completion time of jobs is quite simplistic: it is based on the time it takes to finish each task, so it works well with regular jobs, but there is still room for improvement for unpredictable jobs. The idea is to further integrate it with cloud-like and virtual environments (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't able to meet its deadline, the scheduler automatically requests more resources. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5991) native-task should not run unit tests if native profile is not enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073943#comment-14073943 ] Manu Zhang commented on MAPREDUCE-5991: --- [~tlipcon][~decster], how am I supposed to run a scenario test now ? I've tried mvn test -Dtest=CompressTest and mvn test -Dtest=CompressTest -Pnative but both failed with Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server adresses exception. Removing the test-jar type of hadoop-mapreduce-client-common would work. I think tests depend on hadoop-mapreduce-client-common for LocalJobRunner. native-task should not run unit tests if native profile is not enabled -- Key: MAPREDUCE-5991 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5991 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Todd Lipcon Assignee: Binglin Chang Attachments: MAPREDUCE-5991.v1.patch Currently, running mvn test without the 'native' profile enabled causes all of the native-task tests to fail. In order to integrate to trunk, we need to fix this - either using JUnit Assume commands in each test that depends on native code, or disabling the tests from the pom unless -Pnative is specified -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-6004) native-task should not fail to build if zlib is missing
Manu Zhang created MAPREDUCE-6004: - Summary: native-task should not fail to build if zlib is missing Key: MAPREDUCE-6004 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6004 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Manu Zhang -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-6004) native-task should not fail to build if zlib is missing
[ https://issues.apache.org/jira/browse/MAPREDUCE-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manu Zhang updated MAPREDUCE-6004: -- Description: zlib is required by Gzip. We need to check for its existence in build and exclude Gzip related codes when zlib is missing. similar to MAPREDUCE-5976 native-task should not fail to build if zlib is missing --- Key: MAPREDUCE-6004 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6004 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Reporter: Manu Zhang zlib is required by Gzip. We need to check for its existence in build and exclude Gzip related codes when zlib is missing. similar to MAPREDUCE-5976 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-6005) native-task: fix some valgrind errors
Binglin Chang created MAPREDUCE-6005: Summary: native-task: fix some valgrind errors Key: MAPREDUCE-6005 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6005 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Binglin Chang Assignee: Binglin Chang Running test with valgrind shows there are some bugs, this jira try to fix them. -- This message was sent by Atlassian JIRA (v6.2#6252)