[jira] [Created] (MAPREDUCE-5368) Save memory by set capacity, load factor and concurrency level for ConcurrentHashMap in TaskInProgress
zhaoyunjiong created MAPREDUCE-5368: --- Summary: Save memory by set capacity, load factor and concurrency level for ConcurrentHashMap in TaskInProgress Key: MAPREDUCE-5368 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5368 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1 Affects Versions: 1.2.0 Reporter: zhaoyunjiong Fix For: 1.2.1 Below is histo from our JobTracker: num #instances #bytes class name -- 1: 13604882411347237456 [C 2: 124156992 5959535616 java.util.concurrent.locks.ReentrantLock$NonfairSync 3: 124156973 5959534704 java.util.concurrent.ConcurrentHashMap$Segment 4: 135887753 5435510120 java.lang.String 5: 124213692 3975044400 [Ljava.util.concurrent.ConcurrentHashMap$HashEntry; 6: 63777311 3061310928 java.util.HashMap$Entry 7: 35038252 2803060160 java.util.TreeMap 8: 16921110 2712480072 [Ljava.util.HashMap$Entry; 9: 4803617 2420449192 [Ljava.lang.Object; 10: 50392816 2015712640 org.apache.hadoop.mapred.Counters$Counter 11: 7775438 1181866576 [Ljava.util.concurrent.ConcurrentHashMap$Segment; 12: 3882847 1118259936 org.apache.hadoop.mapred.TaskInProgress ConcurrentHashMap takes more than 14G(5959535616 + 5959534704 + 3975044400). The trouble maker are below codes in TaskInProgress.java: MapTaskAttemptID, Locality taskLocality = new ConcurrentHashMapTaskAttemptID, Locality(); MapTaskAttemptID, Avataar taskAvataar = new ConcurrentHashMapTaskAttemptID, Avataar(); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5369) Progress for jobs with multiple splits in local mode is wrong
Johannes Zillmann created MAPREDUCE-5369: Summary: Progress for jobs with multiple splits in local mode is wrong Key: MAPREDUCE-5369 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5369 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2 Reporter: Johannes Zillmann In case a job with multiple splits is executed in local mode (LocalJobRunner) its progress calculation is wrong. After the first split is processed it jumps to 100%, then back to 50% and so on. The reason lies in the progress calculation in LocalJobRunner: {code} float taskIndex = mapIds.indexOf(taskId); if (taskIndex = 0) { // mapping float numTasks = mapIds.size(); status.setMapProgress(taskIndex/numTasks + taskStatus.getProgress()/numTasks); } else { status.setReduceProgress(taskStatus.getProgress()); } {code} The problem is that {{mapIds}} is filled lazily in run(). There is an loop over all splits. In the loop, the splits task id is added to {{mapIds}}, then the split is processed. That means {{numTasks}} is 1 while the first split is processed, it is 2 while the second task is processed and so on... I tried Hadoop 0.20.2, 1.0.3, 1.1.2 and cdh-4.1. All the same behaviour! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5369) Progress for jobs with multiple splits in local mode is wrong
Johannes Zillmann created MAPREDUCE-5369: Summary: Progress for jobs with multiple splits in local mode is wrong Key: MAPREDUCE-5369 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5369 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2 Reporter: Johannes Zillmann In case a job with multiple splits is executed in local mode (LocalJobRunner) its progress calculation is wrong. After the first split is processed it jumps to 100%, then back to 50% and so on. The reason lies in the progress calculation in LocalJobRunner: {code} float taskIndex = mapIds.indexOf(taskId); if (taskIndex = 0) { // mapping float numTasks = mapIds.size(); status.setMapProgress(taskIndex/numTasks + taskStatus.getProgress()/numTasks); } else { status.setReduceProgress(taskStatus.getProgress()); } {code} The problem is that {{mapIds}} is filled lazily in run(). There is an loop over all splits. In the loop, the splits task id is added to {{mapIds}}, then the split is processed. That means {{numTasks}} is 1 while the first split is processed, it is 2 while the second task is processed and so on... I tried Hadoop 0.20.2, 1.0.3, 1.1.2 and cdh-4.1. All the same behaviour! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Hadoop-Mapreduce-trunk - Build # 1475 - Failure
See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1475/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 30476 lines...] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.913 sec Running org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesAttempts Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.996 sec Running org.apache.hadoop.mapreduce.v2.hs.webapp.TestBlocks Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.164 sec Running org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServices Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.595 sec Running org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryEntities Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.274 sec Running org.apache.hadoop.mapreduce.v2.hs.TestJobListCache Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.481 sec Running org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.925 sec Running org.apache.hadoop.mapreduce.v2.hs.TestJobIdHistoryFileInfoMap Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.443 sec Results : Tests run: 142, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] Reactor Summary: [INFO] [INFO] hadoop-mapreduce-client ... SUCCESS [1.661s] [INFO] hadoop-mapreduce-client-core .. SUCCESS [35.675s] [INFO] hadoop-mapreduce-client-common SUCCESS [23.837s] [INFO] hadoop-mapreduce-client-shuffle ... SUCCESS [2.276s] [INFO] hadoop-mapreduce-client-app ... SUCCESS [5:44.355s] [INFO] hadoop-mapreduce-client-hs FAILURE [1:24.779s] [INFO] hadoop-mapreduce-client-jobclient . SKIPPED [INFO] hadoop-mapreduce-client-hs-plugins SKIPPED [INFO] Apache Hadoop MapReduce Examples .. SKIPPED [INFO] hadoop-mapreduce .. SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 8:13.218s [INFO] Finished at: Tue Jul 02 13:24:20 UTC 2013 [INFO] Final Memory: 21M/225M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12.3:test (default-test) on project hadoop-mapreduce-client-hs: ExecutionException; nested exception is java.util.concurrent.ExecutionException: java.lang.RuntimeException: The forked VM terminated without saying properly goodbye. VM crash or System.exit called ? - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :hadoop-mapreduce-client-hs Build step 'Execute shell' marked build as failure [FINDBUGS] Skipping publisher since build result is FAILURE Archiving artifacts Updating HADOOP-9414 Updating HADOOP-9678 Updating HADOOP-9676 Updating HDFS-4888 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Reopened] (MAPREDUCE-5351) JobTracker memory leak caused by CleanupQueue reopening FileSystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta reopened MAPREDUCE-5351: Reopening as with this fix we are seeing jobs fail with the following exception {code} 13/07/02 16:06:57 DEBUG mapred.JobClient: Printing tokens for job: job_201307020820_0012 13/07/02 16:06:57 DEBUG ipc.Client: IPC Client (47) connection to host/ip:50300 from hortonar sending #32 13/07/02 16:06:57 DEBUG ipc.Client: IPC Client (47) connection to host/ip:50300 from hortonar got value #32 13/07/02 16:06:57 DEBUG retry.RetryUtils: RETRY 0) policy=TryOnceThenFail, exception=org.apache.hadoop.ipc.RemoteException: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:383) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1633) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:364) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1166) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:350) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3599) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3561) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1444) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1440) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1438) 13/07/02 16:06:57 INFO mapred.JobClient: Cleaning up the staging area hdfs://host:8020/user/hortonar/.staging/job_201307020820_0012 13/07/02 16:06:57 ERROR security.UserGroupInformation: PriviledgedActionException as:hortonar cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:383) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1633) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:364) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1166) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:350) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3599) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3561) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1444) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1440) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1438) org.apache.hadoop.ipc.RemoteException: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:383) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1633) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:364) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1166) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:350) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3599) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3561) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1444) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1440) at
Re: [VOTE] Release Apache Hadoop 0.23.9
+1 downloaded the release. Ran a couple of simple jobs and everything worked. On 7/1/13 12:20 PM, Thomas Graves tgra...@yahoo-inc.com wrote: I've created a release candidate (RC0) for hadoop-0.23.9 that I would like to release. The RC is available at: http://people.apache.org/~tgraves/hadoop-0.23.9-candidate-0/ The RC tag in svn is here: http://svn.apache.org/viewvc/hadoop/common/tags/release-0.23.9-rc0/ The maven artifacts are available via repository.apache.org. Please try the release and vote; the vote will run for the usual 7 days til July 8th. I am +1 (binding). thanks, Tom Graves
best way to select a BytesWritable record in a reducer
Hello, I am trying to find the best way to write a reducer that selects a single value for each key it receives. It seems that bytes array in BytesWritable instances are reutilized by mapreduce, requiring me to copy the content of the buffer in order to keep a reference to the data. Here is the code I came up with, the array copy looks kind of ugly to me, and I was wondering if there were any best practice to do this? public static class MyReducer extends ReducerText, BytesWritable, Text, BytesWritable { private byte[] copy(byte[] buffer) { byte[] selected = new byte[buffer.length]; System.arraycopy(buffer, 0, selected, 0, buffer.length); return selected; } @Override public void reduce(Text key, IterableBytesWritable values, Context context) throws IOException, InterruptedException { byte[] selected = null; for (BytesWritable value : values) { if (for-some-reason-I-select-this-value) { selected = copy(value.getBytes()); } } context.write(key, new BytesWritable(selected)); } } Thanks! Pierre
[jira] [Created] (MAPREDUCE-5371) TestProxyUserFromEnv#testProxyUserFromEnvironment failed caused by domains of windows users
Xi Fang created MAPREDUCE-5371: -- Summary: TestProxyUserFromEnv#testProxyUserFromEnvironment failed caused by domains of windows users Key: MAPREDUCE-5371 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5371 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1-win Environment: Windows Reporter: Xi Fang Assignee: Xi Fang Priority: Minor Fix For: 1-win The error message was: Error Message expected:[sijenkins-vm2]jenkins but was:[]jenkins Stacktrace at org.apache.hadoop.security.TestProxyUserFromEnv.testProxyUserFromEnvironment(TestProxyUserFromEnv.java:45) The root cause of this failure is the domain used on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5372) ControlledJob#getMapredJobID capitalization is inconsistent between MR1 and MR2
Sandy Ryza created MAPREDUCE-5372: - Summary: ControlledJob#getMapredJobID capitalization is inconsistent between MR1 and MR2 Key: MAPREDUCE-5372 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5372 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Sandy Ryza In MR2, the 'd' in Id is lowercase, but in MR1, it is capitalized. While ControlledJob is marked as Evolving, there is no reason to be inconsistent here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [VOTE] Release Apache Hadoop 2.1.0-beta
-1. Some of the cli and distcp system tests which use hftp:// and webhdfs:// are failing on secure cluster (HDFS-4841 and HDFS-4952/HDFS-4896). This is a regression and we need to make sure they work before we call a release. On Wed, Jun 26, 2013 at 1:17 AM, Arun C Murthy a...@hortonworks.com wrote: Folks, I've created a release candidate (rc0) for hadoop-2.1.0-beta that I would like to get released. This release represents a *huge* amount of work done by the community (639 fixes) which includes several major advances including: # HDFS Snapshots # Windows support # YARN API stabilization # MapReduce Binary Compatibility with hadoop-1.x # Substantial amount of integration testing with rest of projects in the ecosystem The RC is available at: http://people.apache.org/~acmurthy/hadoop-2.1.0-beta-rc0/ The RC tag in svn is here: http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.0-beta-rc0 The maven artifacts are available via repository.apache.org. Please try the release and vote; the vote will run for the usual 7 days. thanks, Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/