[jira] [Reopened] (MAPREDUCE-6288) mapred job -status fails with AccessControlException
[ https://issues.apache.org/jira/browse/MAPREDUCE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans reopened MAPREDUCE-6288: I still don't consider this fixed. I'm willing to let MR shoot itself in the foot and expose the HDFS layout of the history server. [~jlowe] my disagree with me because he still has to support this code, but please test this fix with a user that has ACL permissions to read the job status, but did not launch the job. If the job is still up and running or not, It will fail with a permission denied error, because the original job owner is the only one in HDFS that has permissions to read the config file. mapred job -status fails with AccessControlException - Key: MAPREDUCE-6288 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.0 Reporter: Robert Kanter Assignee: Robert Kanter Priority: Blocker Fix For: 2.7.0 Attachments: MAPREDUCE-6288-gera-001.patch, MAPREDUCE-6288.002.patch, MAPREDUCE-6288.patch After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred job -status job_1427080398288_0001}} {noformat} Exception in thread main org.apache.hadoop.security.AccessControlException: Permission denied: user=jenkins, access=EXECUTE, inode=/user/history/done:mapred:hadoop:drwxrwx--- at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1213) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201) at
[jira] [Resolved] (MAPREDUCE-5247) FileInputFormat should filter files with '._COPYING_' sufix
[ https://issues.apache.org/jira/browse/MAPREDUCE-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved MAPREDUCE-5247. Resolution: Won't Fix I am happy to hear arguments as to why this is really necessary, but I would rather have my job fail then have the job give me partial/inconsistent results. FileInputFormat should filter files with '._COPYING_' sufix --- Key: MAPREDUCE-5247 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Stan Rosenberg FsShell copy/put creates staging files with '._COPYING_' suffix. These files should be considered hidden by FileInputFormat. (A simple fix is to add the following conjunct to the existing hiddenFilter: {code} !name.endsWith(._COPYING_) {code} After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data loader which uses 'hadoop fs -put' to load data into hourly partitions. We also have intra-hourly jobs which are scheduled to execute several times per hour using the same hourly partition as input. Thus, as the new data is continuously loaded, these staging files (i.e., ._COPYING_) are breaking our jobs (since when copy/put completes staging files are moved). As a workaround, we've defined a custom input path filter and loaded it with mapred.input.pathFilter.class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans reopened MAPREDUCE-4974: I am sorry about reopening this, but I did not take a look at it close enough before I put it in. The compression code cannot be moved. isCompressedInput() uses the value of codec internally. After this change compression is always off for every input format, because codec is never set and is always null. I am happy to leave the other half of the change in place. I will push the change to subversion shortly. Optimising the LineRecordReader initialize() method --- Key: MAPREDUCE-4974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, mrv2, performance Affects Versions: 2.0.2-alpha, 0.23.5 Environment: Hadoop Linux Reporter: Arun A K Assignee: Gelesh Labels: patch, performance Fix For: 0.23.7, 2.0.5-beta Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, MAPREDUCE-4974.4.patch Original Estimate: 1h Remaining Estimate: 1h I found there is a a scope of optimizing the code, over initialize() if we have compressionCodecs codec instantiated only if its a compressed input. Mean while Gelesh George Omathil, added if we could avoid the null check of key value. This would time save, since for every next key value generation, null check is done. The intention being to instantiate only once and avoid NPE as well. Hope both could be met if initialize key value over initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5103) Remove dead code QueueManager and JobEndNotifier
Robert Joseph Evans created MAPREDUCE-5103: -- Summary: Remove dead code QueueManager and JobEndNotifier Key: MAPREDUCE-5103 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5103 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Robert Joseph Evans There are a few classes that are dead or duplicate code at this point. org/apache/hadoop/mapred/JobEndNotifier.java org/apache/hadoop/mapred/QueueManager.java org/apache/hadoop/mapred/QueueConfigurationParser.java org/apache/hadoop/mapred/DeprecatedQueueConfigurationParser.java LocalRunner is currently using the JobEndNotifier, but there is a replacement for in in MRv2 org.apache.hadoop.mapreduce.v2.app.JobEndNotifier. The two should be combined together and duplicate code removed. There appears to only be one method called for the QueueManger and it appears to be setting a property that is not used any more, so it can be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5104) Deprecate and Remove PathCleanupQueue
Robert Joseph Evans created MAPREDUCE-5104: -- Summary: Deprecate and Remove PathCleanupQueue Key: MAPREDUCE-5104 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5104 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Robert Joseph Evans CleanupQueue and InlineCleanupQueue appear to be dead code. However, they are not marked as private and InlineCleanupQueue is part of UtilsForTests, so they could be used by other projects as part of their testing. We should deprecate these in branch-2 and remove them from trunk. There is no point in continuing to test dead code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4354) Performance improvement with compressor object reinit restriction
[ https://issues.apache.org/jira/browse/MAPREDUCE-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved MAPREDUCE-4354. Resolution: Invalid As this is an issue for a different project I am closing this as invalid. The latest LZO code actually looks very similar to this patch so it may not be necessary for newer versions of the LzoCompressor. Performance improvement with compressor object reinit restriction - Key: MAPREDUCE-4354 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4354 Project: Hadoop Map/Reduce Issue Type: Improvement Components: performance Affects Versions: 0.20.205.0 Reporter: Ankit Kamboj Priority: Minor Labels: performance Fix For: 0.20.205.0 Attachments: codec_reinit_diff, modify_lzo_codec_reinit HADOOP-5879 patch aimed at picking the conf (instead of default) settings for GzipCodec. It also involved re-initializing the recycled compressor object. On our performance tests, this re-initialization led to performance degradation of 15% for LzoCodec because re-initialization for Lzo involves reallocation of buffers. LzoCodec takes the initial settings from config so it is not necessary to re-initialize it. This patch checks for the codec class and calls reinit only if the codec class is Gzip. This led to significant performance improvement of 15% for LzoCodec. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5082) CodecPool should avoid OOMs with buggy codecs
Robert Joseph Evans created MAPREDUCE-5082: -- Summary: CodecPool should avoid OOMs with buggy codecs Key: MAPREDUCE-5082 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5082 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Robert Joseph Evans I recently found a bug in the gpl compression libraries that was causing map tasks for a particular job to OOM. https://github.com/omalley/hadoop-gpl-compression/issues/3 Now granted it does not make a lot of sense for a job to use the LzopCodec for map output compression over the LzoCodec, but arguably other codecs could be doing similar things and causing the same sort of memory leaks. I propose that we do a sanity check when creating a new decompressor/compressor. If the codec newly created object does not match the value from getType... it should turn off caching for that Codec. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value
[ https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans reopened MAPREDUCE-5028: I reverted the changes from branch-0.23 too. Reopening so we can take a look at how to fix the patch. Maps fail when io.sort.mb is set to high value -- Key: MAPREDUCE-5028 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Fix For: 1.2.0, 0.23.7, 2.0.5-beta Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, mr-5028-trunk.patch, org.apache.hadoop.mapreduce.v2.TestMRJobs-output.txt Verified the problem exists on branch-1 with the following configuration: Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, io.sort.mb=1280, dfs.block.size=2147483648 Run teragen to generate 4 GB data Maps fail when you run wordcount on this configuration with the following error: {noformat} java.io.IOException: Spill failed at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45) at org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116) at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175) at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5060) Fetch failures that time out only count against the first map task
Robert Joseph Evans created MAPREDUCE-5060: -- Summary: Fetch failures that time out only count against the first map task Key: MAPREDUCE-5060 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5060 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Robert Joseph Evans Priority: Critical When a fetch failure happens, if the socket has already connected it is only counted against the first map task. But most of the time it is because of an issue with the Node itself, not the individual map task, and as such all failures when trying to initiate the connection should count against all of the tasks. This caused a particularly unfortunate job to take an hour an a half longer then it needed to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5051) Combiner not used when NUM_REDUCES=0
[ https://issues.apache.org/jira/browse/MAPREDUCE-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved MAPREDUCE-5051. Resolution: Won't Fix If you feel strongly that this should be supported you can reopen this JIRA as new feature work. Combiner not used when NUM_REDUCES=0 Key: MAPREDUCE-5051 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5051 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 2.0.2-alpha Environment: CDH4.1.2 MR1 Reporter: Damien Hardy We have a M/R job that use Mapper + Combiner but have nothing to do in Reducer : Bulk indexing of HBase data in ElasticSearch, Map output is K / V : #bulk / json_data_to_be_indexed. So job is launched maps work, combiners index and a reducer is created for nothing (sometimes waiting for other M/R job to free a tasktracker slot for reducer cf. MAPREDUCE-5019 ) When we put ```job.setNumReduceTasks(0);``` in our job .run(), mapper are started but combiner are not used. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4940) division by zero in getLocalPathForWrite()
[ https://issues.apache.org/jira/browse/MAPREDUCE-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved MAPREDUCE-4940. Resolution: Duplicate division by zero in getLocalPathForWrite() -- Key: MAPREDUCE-4940 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4940 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Ted Yu see https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/345/testReport/org.apache.hadoop.hbase.mapreduce/TestImportExport/testSimpleCase/ {code} 2013-01-12 11:53:52,809 WARN [AsyncDispatcher event handler] resourcemanager.RMAuditLogger(255): USER=jenkinsOPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1357991604658_0002 failed 1 times due to AM Container for appattempt_1357991604658_0002_01 exited with exitCode: -1000 due to: java.lang.ArithmeticException: / by zero at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:368) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:279) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:851) .Failing this attempt.. Failing the application. APPID=application_1357991604658_0002 {code} Here is related code: {code} // Keep rolling the wheel till we get a valid path Random r = new java.util.Random(); while (numDirsSearched numDirs returnPath == null) { long randomPosition = Math.abs(r.nextLong()) % totalAvailable; {code} My guess is that totalAvailable was 0, meaning dirDF was empty. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4912) Investigate ways to clean up double job commit prevention
Robert Joseph Evans created MAPREDUCE-4912: -- Summary: Investigate ways to clean up double job commit prevention Key: MAPREDUCE-4912 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4912 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Reporter: Robert Joseph Evans Once MAPREDUCE-4819 goes in it fixes the issue where an OutputCommiter can double commit a job. So that the output will never be touched after the job informs externally of success or failure. The code and design could potentially use some cleanup and refactoring. Issues brought up that should be investigated include: # reporting KILL for killed jobs if they crash after the kill happens instead of error. # using the job history log for recording the commit status instead of separate external files in HDFS. # Placing the recovery/retry logic in the commit handler instead of the MRAppMaster, and having the recovery service replay the logs as it normally does for recovery. This is not meant to be things that must be done, but alternatives that might clean up the code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4901) JobHistoryEventHandler errors should be fatal
Robert Joseph Evans created MAPREDUCE-4901: -- Summary: JobHistoryEventHandler errors should be fatal Key: MAPREDUCE-4901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4901 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.0-alpha, 0.23.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans To be able to truly fix issues like MAPREDUCE-4819 and MAPREDUCE-4832, we need a 2 phase commit where a subsequent AM can be sure that at a specific point in time it knows exactly if any tasks/jobs are committing. The job history log is already used for similar functionality so we would like to reuse this, but we need to be sure that errors while writing out to the job history log are now fatal. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4888) NLineInputFormat dropps data in 1.1 and beyond
Robert Joseph Evans created MAPREDUCE-4888: -- Summary: NLineInputFormat dropps data in 1.1 and beyond Key: MAPREDUCE-4888 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4888 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.1.0 Reporter: Robert Joseph Evans Priority: Blocker When trying to root cause why MAPREDUCE-4782 did not cause us issues on 1.0.2, I found out that HADOOP-7823 introduced essentially the exact same error into org.apache.hadoop.mapred.lib.NLineInputFormat. In 1.X org.apache.hadoop.mapred.lib.NLineInputFormat and org.apache.hadoop.mapreduce.lib.input.NLineInputFormat are separate implementations. The latter had an off by one error in it until MAPREDUCE-4782 fixed it. The former had no error in it until HADOOP-7823 introduced it in 1.1 and MAPREDUCE-375 combined the implementations together but picked the implementation with the off by one error in 0.21. I will attach a patch that exposes the error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4832) MR AM can get in a split brain situation
Robert Joseph Evans created MAPREDUCE-4832: -- Summary: MR AM can get in a split brain situation Key: MAPREDUCE-4832 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4832 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 0.23.5, 2.0.2-alpha Reporter: Robert Joseph Evans It is possible for a networking issue to happen where the RM thinks an AM has gone down and launches a replacement, but the previous AM is still up and running. If the previous AM does not need any more resources from the RM it could try to commit either tasks or jobs. This could cause lots of problems where the second AM finishes and tries to commit too. This could result in data corruption. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4833) Task can get stuck in FAIL_CONTAINER_CLEANUP
Robert Joseph Evans created MAPREDUCE-4833: -- Summary: Task can get stuck in FAIL_CONTAINER_CLEANUP Key: MAPREDUCE-4833 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4833 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.5 Reporter: Robert Joseph Evans Priority: Critical If an NM goes down and the AM still tries to launch a container on it the ContainerLauncherImpl can get stuck in an RPC timeout. At the same time the RM may notice that the NM has gone away and inform the AM of this, this triggers a TA_FAILMSG. If the TA_FAILMSG arrives at the TaskAttemptImpl before the TA_CONTAINER_LAUNCH_FAILED message then the task attempt will try to kill the container, but the ContainerLauncherImpl will not send back a TA_CONTAINER_CLEANED event causing the attempt to be stuck. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4822) Unnessisary conversions in History Events
Robert Joseph Evans created MAPREDUCE-4822: -- Summary: Unnessisary conversions in History Events Key: MAPREDUCE-4822 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4822 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Priority: Trivial There are a number of conversions in the Job History Event classes that are totally unnecessary. It appears that they were originally used to convert from the internal avro format, but now many of them do not pull the values from the avro they store them internally. For example: {code:title=TaskAttemptFinishedEvent.java} /** Get the task type */ public TaskType getTaskType() { return TaskType.valueOf(taskType.toString()); } {code} The code currently is taking an enum, converting it to a string and then asking the same enum to convert it back to an enum. If java work properly this should be a noop and a reference to the original taskType should be returned. There are several places that a string is having toString called on it, and since strings are immutable it returns a reference to itself. The various ids are not immutable and probably should not be changed at this point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4480) T_ATTEMPT_KILLED after SUCCEEDED can happen for reduces too
[ https://issues.apache.org/jira/browse/MAPREDUCE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved MAPREDUCE-4480. Resolution: Not A Problem This appears to no longer be a problem. Several JIRA have modified the MapRetoractiveKillTransition to the point that now the only time that this can happen is if a successful reduce task attempt is killed after it has succeeded, which should never happen, because the reduce succeeded. T_ATTEMPT_KILLED after SUCCEEDED can happen for reduces too Key: MAPREDUCE-4480 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4480 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0, 2.0.2-alpha Reporter: Robert Joseph Evans Priority: Critical This does not seem to impact 0.23. If speculative execution is enabled then a T_ATTEMPT_KILLED event can come in after the task has transitioned to SUCCEEDED. This causes the MapRetroactiveKilledTransition to kill the Job, because it expects to only handle map tasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4775) Reducer will never commit suicide
Robert Joseph Evans created MAPREDUCE-4775: -- Summary: Reducer will never commit suicide Key: MAPREDUCE-4775 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4775 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Critical In 1.0 there are a number of conditions that will cause a reducer to commit suicide and exit. This includes if it is stalled, if the error percentage of total fetches is too high. In the new code it will only commit suicide when the total number of failures for a single task attempt is = max(30, totalMaps/10). In the best case with the quadratic back-off to get a single map attempt to reach 30 failure it would take 20.5 hours. And unless there is only one reducer running the map task would have been restarted before then. We should go back to include the same reducer suicide checks that are in 1.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4772) Fetch failures can take way too long for a map to be restarted
Robert Joseph Evans created MAPREDUCE-4772: -- Summary: Fetch failures can take way too long for a map to be restarted Key: MAPREDUCE-4772 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4772 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.4 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Critical In one particular case we saw a NM go down at just the right time, that most of the reducers got the output of the map tasks, but not all of them. The ones that failed to get the output reported to the AM rather quickly that they could not fetch from the NM, but because the other reducers were still running the AM would not relaunch the map task because there weren't more than 50% of the running reducers that had reported fetch failures. Then because of the exponential back-off for fetches on the reducers it took until 1 hour 45 min for the reduce tasks to hit another 10 fetch failures and report in again. At that point the other reducers had finished and the job relaunched the map task. If the reducers had still been running at 1:45 I have no idea how long it would have taken for each of the tasks to get to 30 fetch failures. We need to trigger the map based off of percentage of reducers shuffling, not percentage of reducers running, we also need to have a maximum limit of the back off, so that we don't ever have the reducer waiting for days to try and fetch map output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4766) in diagnostics task ids and task attempt ids should become clickable links
Robert Joseph Evans created MAPREDUCE-4766: -- Summary: in diagnostics task ids and task attempt ids should become clickable links Key: MAPREDUCE-4766 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4766 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.5 Reporter: Robert Joseph Evans It would be great if when we see a task id or a task attempt id in the diagnostics that we change it to be a clickable link. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4760) Make a version of Counters that is composit for the job and stores the counter values in arrays.
Robert Joseph Evans created MAPREDUCE-4760: -- Summary: Make a version of Counters that is composit for the job and stores the counter values in arrays. Key: MAPREDUCE-4760 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4760 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.4, 2.0.2-alpha Reporter: Robert Joseph Evans Priority: Minor String interning reduced the size of counters a lot. After that and the fix for a memory leak in the IPC server a job with 2 map tasks and 3000 reducers takes about 200MB to store the state of all of the tasks. Looking at a memory dump of the AM each task attempt has a pointer to a Counters object that is about 2kb to 3kb in size. That means Counters account for about 56MB of the 200MB of state. This job only had about 40 task counters in it. Each counter stores a long value so if we stored them in a long[] instead we should only be taking up 7MB. Also assuming that some of the counters only appear in a map task or a reduce task we should be able to have one CompositCounters for map tasks and one for reduce tasks so it would reduce the size even further. NOTE: without this change I would expect to be able to run a 100,000 task job in the default 1024MB AM heap (875MB/200MB * 2300) I reserved 150MB for IPC buffers and event data. With this change we could expect to run about 130,000 tasks (875MB/150MB * 2300). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4752) Reduce MR AM memory usage through String Interning
Robert Joseph Evans created MAPREDUCE-4752: -- Summary: Reduce MR AM memory usage through String Interning Key: MAPREDUCE-4752 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4752 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans There are a lot of strings that are duplicates of one another in the AM. This comes from all of the PB events the come across the wire and also tasks heart-beating in through the umbilical. There are even several duplicates from Configuration. By interning all of these strings on the Heap I have been able to reduce the resting memory usage of the AM to be about 5KB per task attempt. With about half of this coming from counters. This results in a 5MB heap for a typical 1000 task job, or a 500MB heap for a 100,000 task attempt job. I think I could cut the size of the counters in half by completely rewriting how counters work in the AM and History Server, but I don't think it is worth it at this point. I am still investigating what the memory usage of the AM is like when running very large jobs, and I will probably have a follow-up JIRA for reducing that memory usage as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4549) Distributed cache conflicts breaks backwards compatability
[ https://issues.apache.org/jira/browse/MAPREDUCE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved MAPREDUCE-4549. Resolution: Fixed Distributed cache conflicts breaks backwards compatability -- Key: MAPREDUCE-4549 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4549 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.3 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Critical Fix For: 0.23.5 Attachments: MR-4549-branch-0.23.txt I recently put in MAPREDUCE-4503 which went a bit too far, and broke backwards compatibility with 1.0 in distribtued cache entries. instead of changing the behavior of the distributed cache to more closely match 1.0 behavior I want to just change the exception to a warning message informing the users that it will become an error in 2.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4303) Look at using String.intern to dedupe some Strings
[ https://issues.apache.org/jira/browse/MAPREDUCE-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved MAPREDUCE-4303. Resolution: Duplicate Look at using String.intern to dedupe some Strings -- Key: MAPREDUCE-4303 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4303 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Robert Joseph Evans MAPREDUCE-4301 fixes one issue with too many duplicate strings, but there are other places where it is not as simple to remove the duplicates. In these cases the source of the strings is an incoming RPC call or from parsing and reading in a file. The only real way to dedupe these is to either use String.intern() which if not used properly could result in the permgen space being filled up, or by playing games with our own cache, and trying to do the same sort of thing as String.intern, but in the heap. The following are some that I saw lots of duplicate strings that we should look at doing something about. TaskAttemptStatusUpdateEvent$TaskAttemptState.stateString MapTaskAttemptImpl.diagnostics The keys to Counters.groups GenericGroup.displayName The keys to GenericGroup.counters and GenericCounter.displayName -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4748) Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED
Robert Joseph Evans created MAPREDUCE-4748: -- Summary: Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED Key: MAPREDUCE-4748 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4748 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.3 Reporter: Robert Joseph Evans We saw this happen when running a large pig script. {noformat} 2012-10-23 22:45:24,986 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Can't handle this event at current state for task_1350837501057_21978_m_040453 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:604) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:89) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:914) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:908) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:619) {noformat} Speculative execution was enabled, and that task did speculate so it looks like this is an error in the state machine either between the task attempts or just within that single task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4731) FSShell doble encodes qualified Paths
Robert Joseph Evans created MAPREDUCE-4731: -- Summary: FSShell doble encodes qualified Paths Key: MAPREDUCE-4731 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4731 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.2-alpha, 0.23.3 Reporter: Robert Joseph Evans {noformat} $ hadoop fs -mkdir /tmp/me $ hadoop fs -touchz /tmp/me/A%3AB $ hadoop fs -ls /tmp/me/A%3AB Found 1 items -rw--- 3 me hdfs 0 2012-10-18 17:47 /tmp/me/A%3AB $ hadoop fs -ls hdfs:///tmp/me/A%3AB Found 1 items -rw--- 3 me hdfs 0 2012-10-18 17:47 hdfs:///tmp/me/A%253AB $ hadoop fs -cat hdfs:///tmp/me/A%3AB cat: File does not exist: /tmp/me/A%253AB $ hadoop fs -cat /tmp/me/A%3AB {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4647) We should only unjar jobjar if there is a lib directory in it.
Robert Joseph Evans created MAPREDUCE-4647: -- Summary: We should only unjar jobjar if there is a lib directory in it. Key: MAPREDUCE-4647 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4647 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.3 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans For backwards compatibility we recently added made is so we would unjar the job.jar and add anything to the classpath in the lib directory of that jar. But this also slows job startup down a lot if the jar is large. We should only unjar it if actually doing so would add something new to the classpath. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4611) MR AM dies badly when Node is decomissioned
Robert Joseph Evans created MAPREDUCE-4611: -- Summary: MR AM dies badly when Node is decomissioned Key: MAPREDUCE-4611 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4611 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.0-alpha, 0.23.3, 3.0.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans The MR AM always thinks that it is being killed by the RM when it gets a kill signal and it has not finished processing yet. In reality the RM kill signal is only sent when the client cannot communicate directly with the AM, which probably means that the AM is in a bad state already. The much more common case is that the node is marked as unhealthy or decomissioned. I propose that in the short term the AM will only clean up if # The process has been asked by the client to exit (kill) # The process job has finished cleanly and is exiting already # This is that last retry of the AM retries. The downside here is that the .staging directory will be leaked and the job will not show up in the history server on an kill from the RM in some cases. At least until the full set of AM cleanup issues can be addressed, probably as part of MAPREDUCE-4428 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4600) TestTokenCache.java from MRV1 no longer compiles
Robert Joseph Evans created MAPREDUCE-4600: -- Summary: TestTokenCache.java from MRV1 no longer compiles Key: MAPREDUCE-4600 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4600 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.3, 2.1.0-alpha, 3.0.0, 2.2.0-alpha Reporter: Robert Joseph Evans Assignee: Daryn Sharp Priority: Critical {noformat} [javac] hadoop-mapreduce-project/build.xml:569: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds [javac] Compiling 95 source files to hadoop-mapreduce-project/build/test/mapred/classes [javac] hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/mapreduce/security/TestTokenCache.java:291: cannot find symbol [javac] symbol : method getDelegationToken(org.apache.hadoop.security.Credentials,java.lang.String) [javac] location: class org.apache.hadoop.mapreduce.security.TokenCache [javac] TokenDelegationTokenIdentifier nnt = TokenCache.getDelegationToken( [javac] ^ [javac] hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/mapreduce/security/TestTokenCache.java:350: cannot find symbol [javac] symbol : method getDelegationTokens(java.lang.String) [javac] location: class org.apache.hadoop.hdfs.HftpFileSystem [javac] }}).when(hfs).getDelegationTokens(renewer); [javac]^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] 2 errors {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4539) Please delete me
[ https://issues.apache.org/jira/browse/MAPREDUCE-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved MAPREDUCE-4539. Resolution: Duplicate Please delete me Key: MAPREDUCE-4539 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4539 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Trivial I am in a bad state will someone please delete me? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4538) Please delete me
[ https://issues.apache.org/jira/browse/MAPREDUCE-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved MAPREDUCE-4538. Resolution: Duplicate Please delete me Key: MAPREDUCE-4538 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4538 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Trivial Attachments: MR-4538.txt I am in a bad state will someone please delete me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4549) Distributed cache conflicts breaks backwards compatability
Robert Joseph Evans created MAPREDUCE-4549: -- Summary: Distributed cache conflicts breaks backwards compatability Key: MAPREDUCE-4549 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4549 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.3, 2.1.0-alpha, 3.0.0, 2.2.0-alpha Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Critical I recently put in MAPREDUCE-4503 which went a bit too far, and broke backwards compatibility with 1.0 in distribtued cache entries. This is to change the behavior of the distributed cache to more closely match that of 1.0. In 1.0 when adding in a cache archive link the first link would win (be the one that was created), not the last one as is the current behavior, when there were conflicts then all of the others were ignored and just did not get a symlink created, and finally no symlink was created for archives that had did not have a fragment in the URL. To simulate this behavior after we parse the cache files and cache archives configuration we should walk through all conflicting links and pick the first link that has a fragment to win. If no link has a fragment then it is just the first link wins. All other conflicting links will have a warning an the name of the link will be changed to include a UUID. If the same file is both in the distributed cache as a cache file and a cache archive we will throw an exception, for backwards compatibility. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4504) SortValidator writes to wrong directory
Robert Joseph Evans created MAPREDUCE-4504: -- Summary: SortValidator writes to wrong directory Key: MAPREDUCE-4504 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4504 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans SortValidator tries to write to jobConf.get(hadoop.tmp.dir, /tmp), but it is not intended to be an HDFS directory. it should just be /tmp. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-3320) Error conditions in web apps should stop pages from rendering.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved MAPREDUCE-3320. Resolution: Invalid Fix Version/s: (was: 0.24.0) The UI actually will do a redirect back to itself with a cookie set indicating that an error happened. This results in the page being redrawn with the error. Error conditions in web apps should stop pages from rendering. -- Key: MAPREDUCE-3320 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3320 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0, 0.24.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans There are several places in the web apps where an error condition should short circuit the page from rendering, but it does not. Ideally the web app framework should be extended to support exceptions similar to Jersey that can have an HTTP return code associated with them. Then all of the places that produce custom error pages can just throw these exceptions instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4458) Warn in java.library.path is used
Robert Joseph Evans created MAPREDUCE-4458: -- Summary: Warn in java.library.path is used Key: MAPREDUCE-4458 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4458 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans If java.library.path is used on the command line for launching an MRAppMaster or an MR Task, it could conflict with how standard Hadoop/HDFS JNI libraries and dependencies are found. At a minimum the client should output a warning and ask the user to switch to LD_LIBRARY_PATH. It would be nice to automatically do this for them but parsing the command line is scary so just a warning is probably good enough for now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4423) Potential infinet fetching of map output
Robert Joseph Evans created MAPREDUCE-4423: -- Summary: Potential infinet fetching of map output Key: MAPREDUCE-4423 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4423 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.3, 2.0.1-alpha, 3.0.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Inside Fetcher.java there are a few cases where an error can happen and the corresponding map task is not marked as a fetch failure. One of these is if the Shuffle server returns a malformed result. MAPREDUCE-3992 makes this case a lot less common, but it is still possible. IF the shuffle handler always returns a malformed result, but a OK response the Fetcher will never stop trying to fetch those results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4373) Fix Javadoc warnings in JobClient.
Robert Joseph Evans created MAPREDUCE-4373: -- Summary: Fix Javadoc warnings in JobClient. Key: MAPREDUCE-4373 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4373 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.1-alpha, 3.0.0 Reporter: Robert Joseph Evans It looks like MAPREDUCE-4355 added in two new javadoc warnings. {code} [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/JobClient.java:651: warning - @param argument jobid is not a parameter name. [WARNING] /home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/JobClient.java:669: warning - @param argument jobid is not a parameter name. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4375) Show Configuration Tracability in MR UI
Robert Joseph Evans created MAPREDUCE-4375: -- Summary: Show Configuration Tracability in MR UI Key: MAPREDUCE-4375 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4375 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster Affects Versions: 0.23.3 Reporter: Robert Joseph Evans Once HADOOP-8525 goes in we should provide a way for the Configuration UI to display the traceability information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4313) TestTokenCache doesn't compile due TokenCache.getDelegationToken compilation error
[ https://issues.apache.org/jira/browse/MAPREDUCE-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved MAPREDUCE-4313. Resolution: Fixed Fix Version/s: 3.0.0 2.0.1-alpha I checked the small fix into branch-2, and trunk. These are the two places that the other change went in that broke this. TestTokenCache doesn't compile due TokenCache.getDelegationToken compilation error -- Key: MAPREDUCE-4313 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4313 Project: Hadoop Map/Reduce Issue Type: Bug Components: build, test Reporter: Eli Collins Assignee: Robert Joseph Evans Priority: Blocker Fix For: 2.0.1-alpha, 3.0.0 Saw this on the trunk Jenkins job: {noformat} compile-mapred-test: [mkdir] Created dir: /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk/trunk/hadoop-mapreduce-project/build/test/mapred/classes [mkdir] Created dir: /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk/trunk/hadoop-mapreduce-project/build/test/mapred/testjar [mkdir] Created dir: /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk/trunk/hadoop-mapreduce-project/build/test/mapred/testshell [javac] Compiling 95 source files to /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk/trunk/hadoop-mapreduce-project/build/test/mapred/classes [javac] /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk/trunk/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/mapreduce/security/TestTokenCache.java:292: incompatible types [javac] found : org.apache.hadoop.security.token.Tokencapture#315 of ? [javac] required: org.apache.hadoop.security.token.Tokenorg.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier [javac] TokenDelegationTokenIdentifier nnt = TokenCache.getDelegationToken( [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] 1 error {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4301) Dedupe some strings in MRAM for memory savings
Robert Joseph Evans created MAPREDUCE-4301: -- Summary: Dedupe some strings in MRAM for memory savings Key: MAPREDUCE-4301 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4301 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster Affects Versions: 2.0.0-alpha, 0.23.3 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Recently an OutOfMemoryError caused one of our jobs to become a zombie (MAPREDUCE-4300). It was a rather large job with 78000+ map tasks and only 750MB of heap configured. I took a heap dump to see if there were any obvious memory leaks, and I could not find any, but yourkit and some digging found some potential memory optimizations that we could do. In this particular case we could save about 20MB if SplitMetaInfoReader.readSplitMetaInfo only computed the JobSplitFile once instead of for each split. (a 2 line change) I will look into some others and see if there are more savings I can come up with. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4303) Look at using String.intern to dedupe some Strings
Robert Joseph Evans created MAPREDUCE-4303: -- Summary: Look at using String.intern to dedupe some Strings Key: MAPREDUCE-4303 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4303 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster Affects Versions: 2.0.0-alpha, 0.23.3 Reporter: Robert Joseph Evans MAPREDUCE-4301 fixes one issue with too many duplicate strings, but there are other places where it is not as simple to remove the duplicates. In these cases the source of the strings is an incoming RPC call or from parsing and reading in a file. The only real way to dedupe these is to either use String.intern() which if not used properly could result in the permgen space being filled up, or by playing games with our own cache, and trying to do the same sort of thing as String.intern, but in the heap. The following are some that I saw lots of duplicate strings that we should look at doing something about. TaskAttemptStatusUpdateEvent$TaskAttemptState.stateString MapTaskAttemptImpl.diagnostics The keys to Counters.groups GenericGroup.displayName The keys to GenericGroup.counters and GenericCounter.displayName -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4300) OOM in AM can turn it into a zombie.
Robert Joseph Evans created MAPREDUCE-4300: -- Summary: OOM in AM can turn it into a zombie. Key: MAPREDUCE-4300 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4300 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 0.23.3 Reporter: Robert Joseph Evans It looks like 4 threads in the AM died with OOM but not the one pinging the RM. stderr for this AM {noformat} WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. May 30, 2012 4:49:55 AM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information. May 30, 2012 4:49:55 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class May 30, 2012 4:49:55 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class May 30, 2012 4:49:55 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class May 30, 2012 4:49:55 AM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate INFO: Initiating Jersey application, version 'Jersey: 1.8 06/24/2011 12:17 PM' May 30, 2012 4:49:55 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope Singleton May 30, 2012 4:49:56 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope Singleton May 30, 2012 4:49:56 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope PerRequest Exception in thread ResponseProcessor for block BP-1114822160-IP-1322528669066:blk_-6528896407411719649_34227308 java.lang.OutOfMemoryError: Java heap space at com.google.protobuf.CodedInputStream.(CodedInputStream.java:538) at com.google.protobuf.CodedInputStream.newInstance(CodedInputStream.java:55) at com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:201) at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:738) at org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos$PipelineAckProto.parseFrom(DataTransferProtos.java:7287) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:95) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:656) Exception in thread DefaultSpeculator background processing java.lang.OutOfMemoryError: Java heap space at java.util.HashMap.resize(HashMap.java:462) at java.util.HashMap.addEntry(HashMap.java:755) at java.util.HashMap.put(HashMap.java:385) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.getTasks(JobImpl.java:632) at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.maybeScheduleASpeculation(DefaultSpeculator.java:465) at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.maybeScheduleAMapSpeculation(DefaultSpeculator.java:433) at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.computeSpeculations(DefaultSpeculator.java:509) at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.access$100(DefaultSpeculator.java:56) at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator$1.run(DefaultSpeculator.java:176) at java.lang.Thread.run(Thread.java:619) Exception in thread Timer for 'MRAppMaster' metrics system java.lang.OutOfMemoryError: Java heap space Exception in thread Socket Reader #4 for port 50500 java.lang.OutOfMemoryError: Java heap space {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Resolved] (MAPREDUCE-4162) Correctly set token service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved MAPREDUCE-4162. Resolution: Fixed Fix Version/s: 0.23.3 Correctly set token service --- Key: MAPREDUCE-4162 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4162 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: client, mrv2 Affects Versions: 0.23.0, 0.24.0, 2.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Fix For: 0.23.3, 2.0.0, 3.0.0 Attachments: MAPREDUCE-4162.patch Use {{SecurityUtils.setTokenService}} to set token services. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4237) TestNodeStatusUpdater can fail if localhost has a domain associated with it
Robert Joseph Evans created MAPREDUCE-4237: -- Summary: TestNodeStatusUpdater can fail if localhost has a domain associated with it Key: MAPREDUCE-4237 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4237 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans On some systems, RHEL where I work, localhost can resolve to localhost.localdomain. TestNodeStatusUpdater can fail because the nodeid containes .localdomain which is not expected by the hard coded localhost string. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4233) NPE can happen in RMNMNodeInfo.
Robert Joseph Evans created MAPREDUCE-4233: -- Summary: NPE can happen in RMNMNodeInfo. Key: MAPREDUCE-4233 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4233 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.3 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Critical {noformat} Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.RMNMInfo.getLiveNodeManagers(RMNMInfo.java:96) at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65) at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216) at javax.management.StandardMBean.getAttribute(StandardMBean.java:358) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666) {noformat} Looks like rmcontext.getRMNodes() is not kept in sync with scheduler.getNodeReport(), so that the report can be null even though the context still knowns about the node. The simple fix is to add in a null check. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (MAPREDUCE-4162) Correctly set token service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans reopened MAPREDUCE-4162: Looks like even though the patch applies cleanly to branch-0.23 it is missing a dependency. I am reverting the changes, just to branch-0.23 until the dependency can be addressed. Correctly set token service --- Key: MAPREDUCE-4162 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4162 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: client, mrv2 Affects Versions: 0.23.0, 0.24.0, 2.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Fix For: 2.0.0, 3.0.0 Attachments: MAPREDUCE-4162.patch Use {{SecurityUtils.setTokenService}} to set token services. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4208) The job is hanging up but never continuing until you kill the child process
[ https://issues.apache.org/jira/browse/MAPREDUCE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved MAPREDUCE-4208. Resolution: Not A Problem The job is hanging up but never continuing until you kill the child process Key: MAPREDUCE-4208 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4208 Project: Hadoop Map/Reduce Issue Type: Bug Environment: Hadoop 0.20.203.0 Hbase 0.90.3 Hive 0.80.1 Reporter: ccw I use the hive MR query on hbase,but the job is never end. The job is hanging but never continuing util you kill the child process 2012-04-28 18:22:33,661 Stage-1 map = 0%, reduce = 0% 2012-04-28 18:22:59,760 Stage-1 map = 25%, reduce = 0% 2012-04-28 18:23:04,782 Stage-1 map = 38%, reduce = 0% 2012-04-28 18:23:07,796 Stage-1 map = 50%, reduce = 0% 2012-04-28 18:23:08,801 Stage-1 map = 50%, reduce = 8% 2012-04-28 18:23:17,839 Stage-1 map = 50%, reduce = 17% 2012-04-28 18:23:19,848 Stage-1 map = 63%, reduce = 17% 2012-04-28 18:23:32,909 Stage-1 map = 63%, reduce = 21% 2012-04-28 18:23:57,017 Stage-1 map = 75%, reduce = 21% 2012-04-28 18:24:09,075 Stage-1 map = 75%, reduce = 25% 2012-04-28 18:25:09,397 Stage-1 map = 75%, reduce = 25% 2012-04-28 18:26:09,688 Stage-1 map = 75%, reduce = 25% 2012-04-28 18:27:09,980 Stage-1 map = 75%, reduce = 25% 2012-04-28 18:28:10,262 Stage-1 map = 75%, reduce = 25% 2012-04-28 18:29:10,522 Stage-1 map = 75%, reduce = 25% 2012-04-28 18:30:10,742 Stage-1 map = 75%, reduce = 25% 2012-04-28 18:31:10,985 Stage-1 map = 75%, reduce = 25% 2012-04-28 18:32:11,238 Stage-1 map = 75%, reduce = 25% 2012-04-28 18:33:11,467 Stage-1 map = 75%, reduce = 25% 2012-04-28 18:34:11,731 Stage-1 map = 75%, reduce = 25% 2012-04-28 18:35:11,968 Stage-1 map = 75%, reduce = 25% 2012-04-28 18:36:12,213 Stage-1 map = 75%, reduce = 25% 2012-04-28 18:37:12,508 Stage-1 map = 75%, reduce = 25% 2012-04-28 18:38:12,747 Stage-1 map = 75%, reduce = 25% 2012-04-28 18:39:12,970 Stage-1 map = 75%, reduce = 25% 2012-04-28 18:40:13,205 Stage-1 map = 75%, reduce = 25% I checked the TT log, 2012-04-28 18:31:53,879 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:31:56,883 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:31:59,887 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:32:02,892 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:32:05,897 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:32:08,902 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:32:11,906 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:32:14,910 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:32:17,915 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:32:20,920 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:32:23,924 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:32:26,929 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:32:29,934 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:32:32,938 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:32:35,943 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:32:38,948 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:32:41,953 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:32:44,957 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:32:47,961 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:32:50,966 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:32:53,970 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:32:56,974 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:32:59,979 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201204281725_0002_m_02_0 0.0% 2012-04-28 18:33:02,983 INFO org.apache.hadoop.mapred.TaskTracker:
[jira] [Resolved] (MAPREDUCE-3958) RM: Remove RMNodeState and replace it with NodeState
[ https://issues.apache.org/jira/browse/MAPREDUCE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved MAPREDUCE-3958. Resolution: Fixed Fix Version/s: (was: 0.23.2) 3.0.0 2.0.0 Thanks Bikas, I put this into trunk and branch-2. +1 RM: Remove RMNodeState and replace it with NodeState Key: MAPREDUCE-3958 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3958 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Bikas Saha Assignee: Bikas Saha Fix For: 2.0.0, 3.0.0 Attachments: MAPREDUCE-3958-1.patch, MAPREDUCE-3958-2.patch, MAPREDUCE-3958-3.patch, MAPREDUCE-3958.patch RMNodeState is being sent over the wire after MAPREDUCE-3353. This has been done by cloning the enum into NodeState in yarn protocol records. That makes RMNodeState redundant and it should be replaced with NodeState. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3050) YarnScheduler needs to expose Resource Usage Information
YarnScheduler needs to expose Resource Usage Information Key: MAPREDUCE-3050 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3050 Project: Hadoop Map/Reduce Issue Type: Bug Components: resourcemanager Affects Versions: 0.23.0, 0.24.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Blocker Fix For: 0.23.0, 0.24.0 Before the recent refactor The nodes had information in them about how much resources they were using. This information is not hidden inside SchedulerNode. Similarly resource usage information about an application, or in aggregate is only available through the Scheduler and there is not interface to pull it out. We need to expose APIs to get Resource and Container information from the scheduler, in aggregate across the entire cluster, per application, per node, and ideally also per queue if applicable (although there are no JIRAs I am aware of that need this right now). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3036) Some of the Resource Manager memory metrics go negative.
Some of the Resource Manager memory metrics go negative. Key: MAPREDUCE-3036 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3036 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0, 0.24.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Blocker Fix For: 0.23.0, 0.24.0 ReservedGB seems to always be decremented when a container is released, even though the container never reserved any memory. AvailableGB also seems to be able to go negative in a few situations. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3001) Map Reduce JobHistory and AppMaster UI should have ability to display task specific counters.
Map Reduce JobHistory and AppMaster UI should have ability to display task specific counters. - Key: MAPREDUCE-3001 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3001 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver, mrv2 Affects Versions: 0.23.0, 0.24.0 Reporter: Robert Joseph Evans Priority: Minor Fix For: 0.23.0, 0.24.0 Map Reduce JobHistory and AppMaster UI should have ability to display task specific counters. I think the best way to do this is to include in the Nav Block a task specific section with task links when a task is selected. Counters is already set up to deal with a task passed in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3002) Delink History Context from AppContext
Delink History Context from AppContext -- Key: MAPREDUCE-3002 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3002 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver, mrv2 Affects Versions: 0.24.0 Reporter: Robert Joseph Evans Currently the JobHistory Server has a HistoryContext that pretends to be a Map Reduce ApplicationMaster's AppContext so that UI pages can be shared between the two. This is not ideal because the UIs have already diverged a lot, and we have to translate the native History Server's data into implementations of Job to provide the same interface. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (MAPREDUCE-2936) Contrib Raid compilation broken after HDFS-1620
[ https://issues.apache.org/jira/browse/MAPREDUCE-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans reopened MAPREDUCE-2936: It looks like HDFS-1620 was just merged to branch-0.23 and needs this fix in it now. Contrib Raid compilation broken after HDFS-1620 --- Key: MAPREDUCE-2936 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2936 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Fix For: 0.24.0 Attachments: MAPREDUCE-2936-20110906.txt After working around MAPREDUCE-2935 by removing TestServiceLevelAuthorization and runing the following: At the trunk level: mvn clean install package -Dtar -Pdist -Dmaven.test.skip.exec=true In hadoop-mapreduce-project: ant compile-contrib -Dresolvers=internal yields 14 errors. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-2572) Throttle the deletion of data from the distributed cache
[ https://issues.apache.org/jira/browse/MAPREDUCE-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved MAPREDUCE-2572. Resolution: Duplicate This is not longer relevant because MRV1 is deprecated. MAPREDUCE-2969 will do the same work for MRV2. Throttle the deletion of data from the distributed cache Key: MAPREDUCE-2572 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2572 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0 Attachments: MR-2572-trunk-v1.patch, THROTTLING-security-v1.patch When deleting entries from the distributed cache we do so in a background thread. Once the size limit of the distributed cache is reached all unused entries are deleted. MAPREDUCE-2494 changes this so that entries are deleted in LRU order until the usage falls below a given threshold. In either of these cases we are periodically flooding a disk with delete requests which can slow down all IO operations to a drive. It would be better to be able to throttle this deletion so that it is spread out over a longer period of time. This jira is to add in this throttling. On investigating it seems much simpler to backport MPAREDUCE-2494 to 20S before implementing this change rather then try to implement it without LRU deletion, because LRU goes a long way towards reducing the load on the disk anyways. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2926) 500 Error in ResourceManager UI
500 Error in ResourceManager UI --- Key: MAPREDUCE-2926 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2926 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0, 0.24.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0, 0.24.0 When accessing the resource manager UI the following is returned {noformat} Problem accessing /. Reason: org.codehaus.jackson.type.JavaType.init(Ljava/lang/Class;)V Caused by: java.lang.NoSuchMethodError: org.codehaus.jackson.type.JavaType.init(Ljava/lang/Class;)V at org.codehaus.jackson.map.type.TypeBase.init(TypeBase.java:15) at org.codehaus.jackson.map.type.SimpleType.init(SimpleType.java:45) at org.codehaus.jackson.map.type.SimpleType.init(SimpleType.java:40) at org.codehaus.jackson.map.type.TypeBindings.clinit(TypeBindings.java:20) at org.codehaus.jackson.map.type.TypeFactory._fromType(TypeFactory.java:530) at org.codehaus.jackson.map.type.TypeFactory.type(TypeFactory.java:63) at org.codehaus.jackson.map.ObjectMapper.clinit(ObjectMapper.java:179) at org.apache.hadoop.yarn.webapp.Controller.clinit(Controller.java:43) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at com.google.inject.DefaultConstructionProxyFactory$2.newInstance(DefaultConstructionProxyFactory.java:81) at com.google.inject.ConstructorInjector.construct(ConstructorInjector.java:85) at com.google.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:111) at com.google.inject.InjectorImpl$4$1.call(InjectorImpl.java:758) at com.google.inject.InjectorImpl.callInContext(InjectorImpl.java:804) at com.google.inject.InjectorImpl$4.get(InjectorImpl.java:754) at com.google.inject.InjectorImpl.getInstance(InjectorImpl.java:793) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:136) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:216) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:141) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:93) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:63) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:122) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:110) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:892) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Powered by Jetty:// {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2927) CompletedJob.isUber throws a Yarn exception which makes the JobHistory UI unusable.
CompletedJob.isUber throws a Yarn exception which makes the JobHistory UI unusable. --- Key: MAPREDUCE-2927 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2927 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0, 0.24.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0, 0.24.0 CompletedJob.isUber on the MR-279 branch returns jobInfo.getIsUber() but got turned into an exception when MR-279 was merged to trunk. SVN Revision 1159166. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2913) TestMRJobs.testFailingMapper does not assert the correct thing.
TestMRJobs.testFailingMapper does not assert the correct thing. --- Key: MAPREDUCE-2913 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2913 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, test Affects Versions: 0.23.0, 0.24.0 Reporter: Robert Joseph Evans Fix For: 0.23.0, 0.24.0 {code} Assert.assertEquals(TaskCompletionEvent.Status.FAILED, events[0].getStatus().FAILED); Assert.assertEquals(TaskCompletionEvent.Status.FAILED, events[1].getStatus().FAILED); {code} when optimized would be {code} Assert.assertEquals(TaskCompletionEvent.Status.FAILED, TaskCompletionEvent.Status.FAILED); Assert.assertEquals(TaskCompletionEvent.Status.FAILED, TaskCompletionEvent.Status.FAILED); {code} obviously these assertions will never fail. If we remove the {code}.FAILED{code} the asserts no longer pass. This could be because MRApp mocks out the task launcher and never actually launches anything. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2876) ContainerAllocationExpirer appears to use the incorrect configs
ContainerAllocationExpirer appears to use the incorrect configs --- Key: MAPREDUCE-2876 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2876 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0 ContainerAllocationExpirer sets the expiration interval to be RMConfig.CONTAINER_LIVELINESS_MONITORING_INTERVAL but uses AMLIVELINESS_MONITORING_INTERVAL as the interval. This is very different from what AMLivelinessMonitor does. There should be two configs RMConfig.CONTAINER_LIVELINESS_MONITORING_INTERVAL for the monitoring interval and RMConfig.CONTAINER_EXPIRY_INTERVAL for the expiry. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2865) MRV2 Job.java needs javadocs in it.
MRV2 Job.java needs javadocs in it. --- Key: MAPREDUCE-2865 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2865 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Fix For: 0.23.0 This may fall under another JIRA already filed, but Job.java in the MRv2 client needs to have javadocs in it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2864) Renaming of configuration property names in yarn
Renaming of configuration property names in yarn Key: MAPREDUCE-2864 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2864 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver, mrv2, nodemanager, resourcemanager Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0 Now that YARN has been put in to trunk we should do something similar to MAPREDUCE-849. We should go back and look at all of the configurations that have been added in and rename them as needed to be consistent and subdivided by component. # We should use all lowercase in the config names. e.g., we should use appsmanager instead of appsManager etc. # history server config names should be prefixed with mapreduce instead of yarn. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2772) MR-279: mrv2 no longer compiles against trunk after common mavenization.
MR-279: mrv2 no longer compiles against trunk after common mavenization. Key: MAPREDUCE-2772 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2772 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0 Attachments: yarn-common-mvn.patch mrv2 no longer compiles against trunk after common mavenization -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2756) JobControl can drop jobs if an error occurs
JobControl can drop jobs if an error occurs --- Key: MAPREDUCE-2756 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2756 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Minor Fix For: 0.23.0 If you run a pig job with UDFs that has not been recompiled for MRV2. There are situations where pig will fail with an error message stating that Hadoop failed and did not give a reason. There is even the possibility of deadlock if an Error is thrown and the JobControl thread dies. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2723) MR-279: port MAPREDUCE-2324 to mrv2
MR-279: port MAPREDUCE-2324 to mrv2 --- Key: MAPREDUCE-2723 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2723 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0 MRV2 currently does not take reduce disk usage into account when trying to schedule a container. For feature parity with the original map reduce it should be extended to allow for disk space requests within containers along with RAM requests. We then also need to port MAPREDUCE-2324 to the scheduler to allow it to avoid starvation of containers that might never get the resources that they need. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-2572) Throttle the deletion of data from the distributed cache
[ https://issues.apache.org/jira/browse/MAPREDUCE-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved MAPREDUCE-2572. Resolution: Won't Fix I filed this and the more I think about it that setting the amount of the distributed cache to keep around between cleanings to a high number really seems like the best way to deal with this. Since it is just a configuration value there is no need to make any changes to code so I will just close this as Won't fix. Throttle the deletion of data from the distributed cache Key: MAPREDUCE-2572 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2572 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.20.205.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: THROTTLING-security-v1.patch When deleting entries from the distributed cache we do so in a background thread. Once the size limit of the distributed cache is reached all unused entries are deleted. MAPREDUCE-2494 changes this so that entries are deleted in LRU order until the usage falls below a given threshold. In either of these cases we are periodically flooding a disk with delete requests which can slow down all IO operations to a drive. It would be better to be able to throttle this deletion so that it is spread out over a longer period of time. This jira is to add in this throttling. On investigating it seems much simpler to backport MPAREDUCE-2494 to 20S before implementing this change rather then try to implement it without LRU deletion, because LRU goes a long way towards reducing the load on the disk anyways. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-2684) Job Tracker can starve reduces with very large input.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans resolved MAPREDUCE-2684. Resolution: Duplicate Job Tracker can starve reduces with very large input. - Key: MAPREDUCE-2684 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2684 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.204.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans If mapreduce.reduce.input.limit is mis-configured or if a cluster is just running low on disk space in general then reduces with large a input may never get scheduled causing the Job to never fail and never succeed, just starve until the job is killed. The JobInProgess tries to guess at the size of the input to all reducers in a job. If the size is over mapreduce.reduce.input.limit then the job is killed. If it is not then findNewReduceTask() checks to see if the estimated size is too big to fit on the node currently looking for work. If it is not then it will let some other task have a chance at the slot. The idea is to keep track of how often it happens that a Reduce Slot is rejected because of the lack of space vs how often it succeeds and then guess if the reduce tasks will ever be scheduled. So I would like some feedback on this. 1) How should we guess. Someone who found the bug here suggested P1 + (P2 * S), where S is the number of successful assignments. Possibly P1 = 20 and P2 = 2.0. I am not really sure. 2) What should we do when we guess that it will never get a slot? Should we fail the job or do we say, even though it might fail, well lets just schedule the it and see if it really will fail. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2672) MR-279: JobHistory Server needs Analysis this job
MR-279: JobHistory Server needs Analysis this job - Key: MAPREDUCE-2672 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2672 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0 The JobHistory Server needs to implement the Analysis this job functionality from the previous server. This should include the following info Hadoop Job ID User : JobName : JobConf : Submitted At : Launched At : (including duration) Finished At : (including duration) Status : Time taken by best performing Map task TASK_LINK: Average time taken by Map tasks: Worse performing map tasks: (including task links and duration) The last Map task TASK_LINK finished at (relative to the Job launch time): (including duration) Time taken by best performing shuffle TASK_LINK: Average time taken by shuffle: Worse performing Shuffles: (including task links and duration) The last Shuffle TASK_LINK finished at (relative to the Job launch time): (including duration) Time taken by best performing Reduce task TASK_LINK: Average time taken by Reduce tasks: Worse performing reduce tasks: (including task links and duration) The last Reduce task TASK_LINK finished at (relative to the Job launch time): (including duration) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2673) MR-279: JobHistory Server should not refresh
MR-279: JobHistory Server should not refresh Key: MAPREDUCE-2673 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2673 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Priority: Minor Fix For: 0.23.0 The Job History Server UI is based off of the Application Master UI, which refreshes the page for jobs regularly. The page should not refresh at all for the JobHistroy, because the job has finished and is not changing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2674) MR-279: JobHistory Server should not use tables for layout
MR-279: JobHistory Server should not use tables for layout -- Key: MAPREDUCE-2674 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2674 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Priority: Minor The Job History Server web pages use table tags for the layout of the various elements on the page. This is not a very maintainable way of laying out a web page. The ideal is to let CSS do all of the layout and have the document itself just have data in it. This is especially important because there are currently no APIs to pull some of this data out, and as such there are tools, that scrape these pages. If we can separate out the layout then even when the layout changes the scrapers will not be impacted. This should probably be investigated in the rest of the UI too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2675) MR-279: JobHistory Server main page needs to be reformatted
MR-279: JobHistory Server main page needs to be reformatted --- Key: MAPREDUCE-2675 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2675 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans The main page of the Job History Server is based off of the Application Master code. It needs to be reformatted to be more useful and better match what was there before. - The Active Jobs title needs to be replaced with something more appropriate (i.e. Retired Jobs) - The table of jobs should have the following columns in it - Submit time, Job Id, Job Name, User and just because I think it would be useful state, maps completed, maps failed, reduces completed, reduces failed - The table needs more advanced filtering, something like http://datatables.net/release-datatables/examples/api/multi_filter.html This is to match the previous search functionality. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2676) MR-279: JobHistory Job page needs reformatted
MR-279: JobHistory Job page needs reformatted - Key: MAPREDUCE-2676 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2676 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans The Job page, The Maps page and the Reduces page for the job history server needs to be reformatted. The Job Overview needs to add in the User, a link to the Job Conf, and the Job ACLs It also needs Submitted at, launched at, and finished at, depending on how they relates to Started and Elapsed. In the attempts table we need to remove the new and the running columns In the tasks table we need to remove progress, pending, and running columns and add in a failed count column We also need to investigate what it would take to add in setup and cleanup statistics. Perhaps these should be more generally Application Master statistics and links. The Maps page and Reduces page should have the progress column removed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2668) MR-279: APPLICATION_STOP is never sent to AuxServices
MR-279: APPLICATION_STOP is never sent to AuxServices - Key: MAPREDUCE-2668 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2668 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Fix For: 0.23.0 APPLICATION_STOP is never sent to the AuxServices only APPLICATION_INIT. This means that all map intermediate data will never be deleted. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2666) MR-279: Need to retrieve shuffle port number on AplicationMaster restart
MR-279: Need to retrieve shuffle port number on AplicationMaster restart Key: MAPREDUCE-2666 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2666 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0 MAPREDUCE-2652 allows ShuffleHandler to return the port it is operating on. In the case of an ApplicationMaster crash where it needs to be restarted that information is lost. We either need to re-query it from each of the NodeManagers or to persist it to the JobHistory logs and retrieve it again. The job history logs is probably the simpler solution. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2652) MR-279: Cannot run multiple NMs on a single node
MR-279: Cannot run multiple NMs on a single node - Key: MAPREDUCE-2652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2652 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Currently in MR-279 the Auxiliary services, like ShuffleHandler, have no way to communicate information back to the applications. Because of this the Map Reduce Application Master has hardcoded in a port of 8080 for shuffle. This prevents the configuration mapreduce.shuffle.port form ever being set to anything but 8080. The code should be updated to allow this information to be returned to the application master. Also the data needs to be persisted to the task log so that on restart the data is not lost. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2659) MR-279: ShuffleHandler should use Protocol Buffers for ServiceData
MR-279: ShuffleHandler should use Protocol Buffers for ServiceData -- Key: MAPREDUCE-2659 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2659 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Fix For: 0.23.0 Auxiliary Services (Specifically ShuffleHandler) should use ProtocolBuffers for storing/retrieving data in the ByteBuffer. Right now there are TODOs to have the format include a version number, but if we want true wire compatibility we should use the same system we are using elsewhere in the code for messages, not something invented as we go along. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority
[ https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans reopened MAPREDUCE-2494: Reopening to add in patch for 0.20.2XX branch Make the distributed cache delete entires using LRU priority Key: MAPREDUCE-2494 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.21.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0 Attachments: MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch Currently the distributed cache will wait until a cache directory is above a preconfigured threshold. At which point it will delete all entries that are not currently being used. It seems like we would get far fewer cache misses if we kept some of them around, even when they are not being used. We should add in a configurable percentage for a goal of how much of the cache should remain clear when not in use, and select objects to delete based off of how recently they were used, and possibly also how large they are/how difficult is it to download them again. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2572) Throttle the deletion of data from the distributed cache
Throttle the deletion of data from the distributed cache Key: MAPREDUCE-2572 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2572 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.20.205.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans When deleting entries from the distributed cache we do so in a background thread. Once the size limit of the distributed cache is reached all unused entries are deleted. MAPREDUCE-2494 changes this so that entries are deleted in LRU order until the usage falls below a given threshold. In either of these cases we are periodically flooding a disk with delete requests which can slow down all IO operations to a drive. It would be better to be able to throttle this deletion so that it is spread out over a longer period of time. This jira is to add in this throttling. On investigating it seems much simpler to backport MPAREDUCE-2494 to 20S before implementing this change rather then try to implement it without LRU deletion, because LRU goes a long way towards reducing the load on the disk anyways. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (MAPREDUCE-2535) JobClient creates a RunningJob with null status and profile
[ https://issues.apache.org/jira/browse/MAPREDUCE-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans reopened MAPREDUCE-2535: The fix is good, but it broke the system tests. Reopening the bug to add in a patch to fix the tests. JobClient creates a RunningJob with null status and profile --- Key: MAPREDUCE-2535 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2535 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 0.20.204.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Attachments: MR-2535-0.20.20X-V1.patch Exception occurred because the job was retired and is removed from RetireJobCcahe and CompletedJobStatusStore. But, the JobClient creates a RunningJob with null status and profile, if getJob(JobID) is called again. So, Even-though not null check is there in the following user code, it did not help. 466 runningJob = jobClient.getJob(mapRedJobID); 467 if(runningJob != null) { JobClient.getJob() should return null if status is null. In trunk this is fixed by validating that the job status is not null every time it is updated, and also verifying that that the profile data is not null when created. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2539) NPE when calling JobClient.getMapTaskReports for retired job
NPE when calling JobClient.getMapTaskReports for retired job Key: MAPREDUCE-2539 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2539 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 0.22.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans When calling JobClient.getMapTaskReports for a retired job this results in a NPE. In the 0.20.* version an empty TaskReport array was returned instead. Caused by: java.lang.NullPointerException at org.apache.hadoop.mapred.JobClient.getMapTaskReports(JobClient.java:588) at org.apache.pig.tools.pigstats.JobStats.addMapReduceStatistics(JobStats.java:388) .. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority
Make the distributed cache delete entires using LRU priority Key: MAPREDUCE-2494 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.21.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Currently the distributed cache will wait until a cache directory is above a preconfigured threshold. At which point it will delete all entries that are not currently being used. It seems like we would get far fewer cache misses if we kept some of them around, even when they are not being used. We should add in a configurable percentage for a goal of how much of the cache should remain clear when not in use, and select objects to delete based off of how recently they were used, and possibly also how large they are/how difficult is it to download them again. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has dies for some reason
The distributed cache cleanup thread has no monitoring to check to see if it has dies for some reason - Key: MAPREDUCE-2495 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.21.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Priority: Minor The cleanup thread in the distributed cache handles IOExceptions and the like correctly, but just to be a bit more defensive it would be good to monitor the thread, and check that it is still alive regularly, so that the distributed cache does not fill up the entire disk on the node. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2479) Backport MAPREDUCE-1568 to hadoop security branch
Backport MAPREDUCE-1568 to hadoop security branch - Key: MAPREDUCE-2479 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2479 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira