[jira] [Reopened] (MAPREDUCE-6288) mapred job -status fails with AccessControlException

2015-03-30 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans reopened MAPREDUCE-6288:


I still don't consider this fixed.  I'm willing to let MR shoot itself in the 
foot and expose the HDFS layout of the history server.  [~jlowe] my disagree 
with me because he still has to support this code, but please test this fix 
with a user that has ACL permissions to read the job status, but did not launch 
the job.

If the job is still up and running or not, It will fail with a permission 
denied error, because the original job owner is the only one in HDFS that has 
permissions to read the config file.

 mapred job -status fails with AccessControlException 
 -

 Key: MAPREDUCE-6288
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6288
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Blocker
 Fix For: 2.7.0

 Attachments: MAPREDUCE-6288-gera-001.patch, MAPREDUCE-6288.002.patch, 
 MAPREDUCE-6288.patch


 After MAPREDUCE-5875, we're seeing this Exception when trying to do {{mapred 
 job -status job_1427080398288_0001}}
 {noformat}
 Exception in thread main org.apache.hadoop.security.AccessControlException: 
 Permission denied: user=jenkins, access=EXECUTE, 
 inode=/user/history/done:mapred:hadoop:drwxrwx---
   at 
 org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
   at 
 org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
   at 
 org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180)
   at 
 org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6553)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6535)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6460)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1919)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1870)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1850)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1822)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:545)
   at 
 org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
   at 
 org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1213)
   at 
 org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1201)
   at 
 

[jira] [Resolved] (MAPREDUCE-5247) FileInputFormat should filter files with '._COPYING_' sufix

2013-07-19 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved MAPREDUCE-5247.


Resolution: Won't Fix

I am happy to hear arguments as to why this is really necessary, but I would 
rather have my job fail then have the job give me partial/inconsistent results.

 FileInputFormat should filter files with '._COPYING_' sufix
 ---

 Key: MAPREDUCE-5247
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Stan Rosenberg

 FsShell copy/put creates staging files with '._COPYING_' suffix.  These files 
 should be considered hidden by FileInputFormat.  (A simple fix is to add the 
 following conjunct to the existing hiddenFilter: 
 {code}
 !name.endsWith(._COPYING_)
 {code}
 After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data 
 loader which uses 'hadoop fs -put' to load data into hourly partitions.  We 
 also have intra-hourly jobs which are scheduled to execute several times per 
 hour using the same hourly partition as input.  Thus, as the new data is 
 continuously loaded, these staging files (i.e., ._COPYING_) are breaking our 
 jobs (since when copy/put completes staging files are moved).
 As a workaround, we've defined a custom input path filter and loaded it with 
 mapred.input.pathFilter.class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-04-01 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans reopened MAPREDUCE-4974:



I am sorry about reopening this, but I did not take a look at it close enough 
before I put it in.

The compression code cannot be moved.  isCompressedInput() uses the value of 
codec internally.  After this change compression is always off for every input 
format, because codec is never set and is always null. I am happy to leave the 
other half of the change in place. I will push the change to subversion shortly.

 Optimising the LineRecordReader initialize() method
 ---

 Key: MAPREDUCE-4974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, performance
Affects Versions: 2.0.2-alpha, 0.23.5
 Environment: Hadoop Linux
Reporter: Arun A K
Assignee: Gelesh
  Labels: patch, performance
 Fix For: 0.23.7, 2.0.5-beta

 Attachments: MAPREDUCE-4974.2.patch, MAPREDUCE-4974.3.patch, 
 MAPREDUCE-4974.4.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I found there is a a scope of optimizing the code, over initialize() if we 
 have compressionCodecs  codec instantiated only if its a compressed input.
 Mean while Gelesh George Omathil, added if we could avoid the null check of 
 key  value. This would time save, since for every next key value generation, 
 null check is done. The intention being to instantiate only once and avoid 
 NPE as well. Hope both could be met if initialize key  value over  
 initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5103) Remove dead code QueueManager and JobEndNotifier

2013-03-26 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-5103:
--

 Summary: Remove dead code QueueManager and JobEndNotifier
 Key: MAPREDUCE-5103
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5103
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Robert Joseph Evans


There are a few classes that are dead or duplicate code at this point.  
org/apache/hadoop/mapred/JobEndNotifier.java

org/apache/hadoop/mapred/QueueManager.java
org/apache/hadoop/mapred/QueueConfigurationParser.java
org/apache/hadoop/mapred/DeprecatedQueueConfigurationParser.java

LocalRunner is currently using the JobEndNotifier, but there is a replacement 
for in in MRv2 org.apache.hadoop.mapreduce.v2.app.JobEndNotifier.  The two 
should be combined together and duplicate code removed.

There appears to only be one method called for the QueueManger and it appears 
to be setting a property that is not used any more, so it can be removed.




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5104) Deprecate and Remove PathCleanupQueue

2013-03-26 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-5104:
--

 Summary: Deprecate and Remove PathCleanupQueue
 Key: MAPREDUCE-5104
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5104
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Robert Joseph Evans


CleanupQueue and InlineCleanupQueue appear to be dead code.  However, they are 
not marked as private and InlineCleanupQueue is part of UtilsForTests, so they 
could be used by other projects as part of their testing.

We should deprecate these in branch-2 and remove them from trunk. There is no 
point in continuing to test dead code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-4354) Performance improvement with compressor object reinit restriction

2013-03-21 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved MAPREDUCE-4354.


Resolution: Invalid

As this is an issue for a different project I am closing this as invalid.  The 
latest LZO code actually looks very similar to this patch so it may not be 
necessary for newer versions of the LzoCompressor.

 Performance improvement with compressor object reinit restriction
 -

 Key: MAPREDUCE-4354
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4354
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: performance
Affects Versions: 0.20.205.0
Reporter: Ankit Kamboj
Priority: Minor
  Labels: performance
 Fix For: 0.20.205.0

 Attachments: codec_reinit_diff, modify_lzo_codec_reinit


 HADOOP-5879 patch aimed at picking the conf (instead of default) settings for 
 GzipCodec. It also involved re-initializing the recycled compressor object. 
 On our performance tests, this re-initialization led to performance 
 degradation of 15% for LzoCodec because re-initialization for Lzo involves 
 reallocation of buffers. LzoCodec takes the initial settings from config so 
 it is not necessary to re-initialize it. This patch checks for the codec 
 class and calls reinit only if the codec class is Gzip. This led to 
 significant performance improvement of 15% for LzoCodec.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5082) CodecPool should avoid OOMs with buggy codecs

2013-03-19 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-5082:
--

 Summary: CodecPool should avoid OOMs with buggy codecs
 Key: MAPREDUCE-5082
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5082
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Robert Joseph Evans


I recently found a bug in the gpl compression libraries that was causing map 
tasks for a particular job to OOM.

https://github.com/omalley/hadoop-gpl-compression/issues/3

Now granted it does not make a lot of sense for a job to use the LzopCodec for 
map output compression over the LzoCodec, but arguably other codecs could be 
doing similar things and causing the same sort of memory leaks.  I propose that 
we do a sanity check when creating a new decompressor/compressor.  If the codec 
newly created object does not match the value from getType... it should turn 
off caching for that Codec.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2013-03-19 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans reopened MAPREDUCE-5028:



I reverted the changes from branch-0.23 too.  Reopening so we can take a look 
at how to fix the patch.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 0.23.7, 2.0.5-beta

 Attachments: mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, 
 org.apache.hadoop.mapreduce.v2.TestMRJobs-output.txt


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5060) Fetch failures that time out only count against the first map task

2013-03-12 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-5060:
--

 Summary: Fetch failures that time out only count against the first 
map task
 Key: MAPREDUCE-5060
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5060
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Robert Joseph Evans
Priority: Critical


When a fetch failure happens, if the socket has already connected it is only 
counted against the first map task.  But most of the time it is because of an 
issue with the Node itself, not the individual map task, and as such all 
failures when trying to initiate the connection should count against all of the 
tasks.

This caused a particularly unfortunate job to take an hour an a half longer 
then it needed to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-5051) Combiner not used when NUM_REDUCES=0

2013-03-07 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved MAPREDUCE-5051.


Resolution: Won't Fix

If you feel strongly that this should be supported you can reopen this JIRA as 
new feature work.

 Combiner not used when NUM_REDUCES=0
 

 Key: MAPREDUCE-5051
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5051
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 2.0.2-alpha
 Environment: CDH4.1.2 MR1
Reporter: Damien Hardy

 We have a M/R job that use Mapper + Combiner but have nothing to do in 
 Reducer :
 Bulk indexing of HBase data in ElasticSearch,
 Map output is K / V : #bulk / json_data_to_be_indexed.
 So job is launched maps work, combiners index and a reducer is created for 
 nothing (sometimes waiting for other M/R job to free a tasktracker slot for 
 reducer cf. MAPREDUCE-5019 )
 When we put ```job.setNumReduceTasks(0);``` in our job .run(), mapper are 
 started but combiner are not used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-4940) division by zero in getLocalPathForWrite()

2013-01-14 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved MAPREDUCE-4940.


Resolution: Duplicate

 division by zero in getLocalPathForWrite()
 --

 Key: MAPREDUCE-4940
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4940
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ted Yu

 see 
 https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/345/testReport/org.apache.hadoop.hbase.mapreduce/TestImportExport/testSimpleCase/
 {code}
 2013-01-12 11:53:52,809 WARN  [AsyncDispatcher event handler] 
 resourcemanager.RMAuditLogger(255): USER=jenkinsOPERATION=Application 
 Finished - Failed TARGET=RMAppManager RESULT=FAILURE  DESCRIPTION=App 
 failed with state: FAILED   PERMISSIONS=Application 
 application_1357991604658_0002 failed 1 times due to AM Container for 
 appattempt_1357991604658_0002_01 exited with  exitCode: -1000 due to: 
 java.lang.ArithmeticException: / by zero
   at 
 org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:368)
   at 
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
   at 
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
   at 
 org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115)
   at 
 org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:279)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:851)
 .Failing this attempt.. Failing the application.  
 APPID=application_1357991604658_0002
 {code}
 Here is related code:
 {code}
 // Keep rolling the wheel till we get a valid path
 Random r = new java.util.Random();
 while (numDirsSearched  numDirs  returnPath == null) {
   long randomPosition = Math.abs(r.nextLong()) % totalAvailable;
 {code}
 My guess is that totalAvailable was 0, meaning dirDF was empty.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4912) Investigate ways to clean up double job commit prevention

2013-01-04 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4912:
--

 Summary: Investigate ways to clean up double job commit prevention
 Key: MAPREDUCE-4912
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4912
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Robert Joseph Evans


Once MAPREDUCE-4819 goes in it fixes the issue where an OutputCommiter can 
double commit a job.  So that the output will never be touched after the job 
informs externally of success or failure.

The code and design could potentially use some cleanup and refactoring.

Issues brought up that should be investigated include:

# reporting KILL for killed jobs if they crash after the kill happens instead 
of error.
# using the job history log for recording the commit status instead of separate 
external files in HDFS.
# Placing the recovery/retry logic in the commit handler instead of the 
MRAppMaster, and having the recovery service replay the logs as it normally 
does for recovery.

This is not meant to be things that must be done, but alternatives that might 
clean up the code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4901) JobHistoryEventHandler errors should be fatal

2012-12-26 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4901:
--

 Summary: JobHistoryEventHandler errors should be fatal
 Key: MAPREDUCE-4901
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4901
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.0-alpha, 0.23.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


To be able to truly fix issues like MAPREDUCE-4819 and MAPREDUCE-4832, we need 
a 2 phase commit where a subsequent AM can be sure that at a specific point in 
time it knows exactly if any tasks/jobs are committing.  The job history log is 
already used for similar functionality so we would like to reuse this, but we 
need to be sure that errors while writing out to the job history log are now 
fatal.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4888) NLineInputFormat dropps data in 1.1 and beyond

2012-12-18 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4888:
--

 Summary: NLineInputFormat dropps data in 1.1 and beyond
 Key: MAPREDUCE-4888
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4888
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.1.0
Reporter: Robert Joseph Evans
Priority: Blocker


When trying to root cause why MAPREDUCE-4782 did not cause us issues on 1.0.2, 
I found out that HADOOP-7823 introduced essentially the exact same error into 
org.apache.hadoop.mapred.lib.NLineInputFormat.

In 1.X org.apache.hadoop.mapred.lib.NLineInputFormat and 
org.apache.hadoop.mapreduce.lib.input.NLineInputFormat are separate 
implementations.  The latter had an off by one error in it until MAPREDUCE-4782 
fixed it. The former had no error in it until HADOOP-7823 introduced it in 1.1 
and MAPREDUCE-375 combined the implementations together but picked the 
implementation with the off by one error in 0.21.

I will attach a patch that exposes the error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4832) MR AM can get in a split brain situation

2012-11-29 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4832:
--

 Summary: MR AM can get in a split brain situation
 Key: MAPREDUCE-4832
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4832
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 0.23.5, 2.0.2-alpha
Reporter: Robert Joseph Evans


It is possible for a networking issue to happen where the RM thinks an AM has 
gone down and launches a replacement, but the previous AM is still up and 
running.  If the previous AM does not need any more resources from the RM it 
could try to commit either tasks or jobs.  This could cause lots of problems 
where the second AM finishes and tries to commit too.  This could result in 
data corruption.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4833) Task can get stuck in FAIL_CONTAINER_CLEANUP

2012-11-29 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4833:
--

 Summary: Task can get stuck in FAIL_CONTAINER_CLEANUP
 Key: MAPREDUCE-4833
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4833
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.5
Reporter: Robert Joseph Evans
Priority: Critical


If an NM goes down and the AM still tries to launch a container on it the 
ContainerLauncherImpl can get stuck in an RPC timeout.  At the same time the RM 
may notice that the NM has gone away and inform the AM of this, this triggers a 
TA_FAILMSG.  If the TA_FAILMSG arrives at the TaskAttemptImpl before the 
TA_CONTAINER_LAUNCH_FAILED message then the task attempt will try to kill the 
container, but the ContainerLauncherImpl will not send back a 
TA_CONTAINER_CLEANED event causing the attempt to be stuck.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4822) Unnessisary conversions in History Events

2012-11-27 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4822:
--

 Summary: Unnessisary conversions in History Events
 Key: MAPREDUCE-4822
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4822
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Affects Versions: 0.23.4
Reporter: Robert Joseph Evans
Priority: Trivial


There are a number of conversions in the Job History Event classes that are 
totally unnecessary.  It appears that they were originally used to convert from 
the internal avro format, but now many of them do not pull the values from the 
avro they store them internally.

For example:

{code:title=TaskAttemptFinishedEvent.java}
  /** Get the task type */
  public TaskType getTaskType() {
return TaskType.valueOf(taskType.toString());
  }
{code}

The code currently is taking an enum, converting it to a string and then asking 
the same enum to convert it back to an enum.  If java work properly this should 
be a noop and a reference to the original taskType should be returned.

There are several places that a string is having toString called on it, and 
since strings are immutable it returns a reference to itself.

The various ids are not immutable and probably should not be changed at this 
point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-4480) T_ATTEMPT_KILLED after SUCCEEDED can happen for reduces too

2012-11-15 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved MAPREDUCE-4480.


Resolution: Not A Problem

This appears to no longer be a problem.  Several JIRA have modified the 
MapRetoractiveKillTransition to the point that now the only time that this can 
happen is if a successful reduce task attempt is killed after it has succeeded, 
which should never happen, because the reduce succeeded.

 T_ATTEMPT_KILLED after SUCCEEDED can happen for reduces too 
 

 Key: MAPREDUCE-4480
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4480
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0, 2.0.2-alpha
Reporter: Robert Joseph Evans
Priority: Critical

 This does not seem to impact 0.23.  If speculative execution is enabled then 
 a T_ATTEMPT_KILLED event can come in after the task has transitioned to 
 SUCCEEDED.  This causes the MapRetroactiveKilledTransition to kill the Job, 
 because it expects to only handle map tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4775) Reducer will never commit suicide

2012-11-06 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4775:
--

 Summary: Reducer will never commit suicide
 Key: MAPREDUCE-4775
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4775
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Critical


In 1.0 there are a number of conditions that will cause a reducer to commit 
suicide and exit.

This includes if it is stalled, if the error percentage of total fetches is too 
high.  In the new code it will only commit suicide when the total number of 
failures for a single task attempt is = max(30, totalMaps/10).  In the best 
case with the quadratic back-off to get a single map attempt to reach 30 
failure it would take 20.5 hours.  And unless there is only one reducer running 
the map task would have been restarted before then.

We should go back to include the same reducer suicide checks that are in 1.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4772) Fetch failures can take way too long for a map to be restarted

2012-11-05 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4772:
--

 Summary: Fetch failures can take way too long for a map to be 
restarted
 Key: MAPREDUCE-4772
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4772
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.4
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Critical


In one particular case we saw a NM go down at just the right time, that most of 
the reducers got the output of the map tasks, but not all of them.

The ones that failed to get the output reported to the AM rather quickly that 
they could not fetch from the NM, but because the other reducers were still 
running the AM would not relaunch the map task because there weren't more than 
50% of the running reducers that had reported fetch failures.  Then because of 
the exponential back-off for fetches on the reducers it took until 1 hour 45 
min for the reduce tasks to hit another 10 fetch failures and report in again. 
At that point the other reducers had finished and the job relaunched the map 
task.  If the reducers had still been running at 1:45 I have no idea how long 
it would have taken for each of the tasks to get to 30 fetch failures.

We need to trigger the map based off of percentage of reducers shuffling, not 
percentage of reducers running, we also need to have a maximum limit of the 
back off, so that we don't ever have the reducer waiting for days to try and 
fetch map output.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4766) in diagnostics task ids and task attempt ids should become clickable links

2012-11-01 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4766:
--

 Summary: in diagnostics task ids and task attempt ids should 
become clickable links
 Key: MAPREDUCE-4766
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4766
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.5
Reporter: Robert Joseph Evans


It would be great if when we see a task id or a task attempt id in the 
diagnostics that we change it to be a clickable link.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4760) Make a version of Counters that is composit for the job and stores the counter values in arrays.

2012-10-31 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4760:
--

 Summary: Make a version of Counters that is composit for the job 
and stores the counter values in arrays.
 Key: MAPREDUCE-4760
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4760
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.4, 2.0.2-alpha
Reporter: Robert Joseph Evans
Priority: Minor


String interning reduced the size of counters a lot.  After that and the fix 
for a memory leak in the IPC server a job with 2 map tasks and 3000 
reducers takes about 200MB to store the state of all of the tasks.  Looking at 
a memory dump of the AM each task attempt has a pointer to a Counters object 
that is about 2kb to 3kb in size.  That means Counters account for about 56MB 
of the 200MB of state.  This job only had about 40 task counters in it.  Each 
counter stores a long value so if we stored them in a long[] instead we should 
only be taking up 7MB.

Also assuming that some of the counters only appear in a map task or a reduce 
task we should be able to have one CompositCounters for map tasks and one for 
reduce tasks so it would reduce the size even further. 

NOTE: without this change I would expect to be able to run a 100,000 task job 
in the default 1024MB AM heap (875MB/200MB * 2300) I reserved 150MB for IPC 
buffers and event data.  With this change we could expect to run about 130,000 
tasks (875MB/150MB * 2300).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4752) Reduce MR AM memory usage through String Interning

2012-10-26 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4752:
--

 Summary: Reduce MR AM memory usage through String Interning
 Key: MAPREDUCE-4752
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4752
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


There are a lot of strings that are duplicates of one another in the AM.  This 
comes from all of the PB events the come across the wire and also tasks 
heart-beating in through the umbilical.  There are even several duplicates from 
Configuration.  By interning all of these strings on the Heap I have been 
able to reduce the resting memory usage of the AM to be about 5KB per task 
attempt.  With about half of this coming from counters.  This results in a 5MB 
heap for a typical 1000 task job, or a 500MB heap for a 100,000 task attempt 
job.  I think I could cut the size of the counters in half by completely 
rewriting how counters work in the AM and History Server, but I don't think it 
is worth it at this point.

I am still investigating what the memory usage of the AM is like when running 
very large jobs, and I will probably have a follow-up JIRA for reducing that 
memory usage as well.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-4549) Distributed cache conflicts breaks backwards compatability

2012-10-24 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved MAPREDUCE-4549.


Resolution: Fixed

 Distributed cache conflicts breaks backwards compatability
 --

 Key: MAPREDUCE-4549
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4549
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Critical
 Fix For: 0.23.5

 Attachments: MR-4549-branch-0.23.txt


 I recently put in MAPREDUCE-4503 which went a bit too far, and broke 
 backwards compatibility with 1.0 in distribtued cache entries.  instead of 
 changing the behavior of the distributed cache to more closely match 1.0 
 behavior I want to just change the exception to a warning message informing 
 the users that it will become an error in 2.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-4303) Look at using String.intern to dedupe some Strings

2012-10-24 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved MAPREDUCE-4303.


Resolution: Duplicate

 Look at using String.intern to dedupe some Strings
 --

 Key: MAPREDUCE-4303
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4303
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: applicationmaster
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Robert Joseph Evans

 MAPREDUCE-4301 fixes one issue with too many duplicate strings, but there are 
 other places where it is not as simple to remove the duplicates.  In these 
 cases the source of the strings is an incoming RPC call or from parsing and 
 reading in a file.  The only real way to dedupe these is to either use 
 String.intern() which if not used properly could result in the permgen space 
 being filled up, or by playing games with our own cache, and trying to do the 
 same sort of thing as String.intern, but in the heap.
 The following are some that I saw lots of duplicate strings that we should 
 look at doing something about.
 TaskAttemptStatusUpdateEvent$TaskAttemptState.stateString
 MapTaskAttemptImpl.diagnostics
 The keys to Counters.groups
 GenericGroup.displayName
 The keys to GenericGroup.counters
 and GenericCounter.displayName

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4748) Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED

2012-10-24 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4748:
--

 Summary: Invalid event: T_ATTEMPT_SUCCEEDED at SUCCEEDED
 Key: MAPREDUCE-4748
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4748
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3
Reporter: Robert Joseph Evans


We saw this happen when running a large pig script.

{noformat}
2012-10-23 22:45:24,986 ERROR [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Can't handle this event 
at current state for task_1350837501057_21978_m_040453
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
T_ATTEMPT_SUCCEEDED at SUCCEEDED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:604)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:89)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:914)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:908)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:619)
{noformat}

Speculative execution was enabled, and that task did speculate so it looks like 
this is an error in the state machine either between the task attempts or just 
within that single task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4731) FSShell doble encodes qualified Paths

2012-10-18 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4731:
--

 Summary: FSShell doble encodes qualified Paths
 Key: MAPREDUCE-4731
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4731
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.2-alpha, 0.23.3
Reporter: Robert Joseph Evans


{noformat}
$ hadoop fs -mkdir /tmp/me
$ hadoop fs -touchz /tmp/me/A%3AB
$ hadoop fs -ls /tmp/me/A%3AB
Found 1 items
-rw---   3 me hdfs  0 2012-10-18 17:47 /tmp/me/A%3AB
$ hadoop fs -ls hdfs:///tmp/me/A%3AB
Found 1 items
-rw---   3 me hdfs  0 2012-10-18 17:47 hdfs:///tmp/me/A%253AB
$ hadoop fs -cat hdfs:///tmp/me/A%3AB
cat: File does not exist: /tmp/me/A%253AB
$ hadoop fs -cat /tmp/me/A%3AB
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4647) We should only unjar jobjar if there is a lib directory in it.

2012-09-10 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4647:
--

 Summary: We should only unjar jobjar if there is a lib directory 
in it.
 Key: MAPREDUCE-4647
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4647
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.3
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


For backwards compatibility we recently added made is so we would unjar the 
job.jar and add anything to the classpath in the lib directory of that jar.  
But this also slows job startup down a lot if the jar is large.  We should only 
unjar it if actually doing so would add something new to the classpath.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4611) MR AM dies badly when Node is decomissioned

2012-08-30 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4611:
--

 Summary: MR AM dies badly when Node is decomissioned
 Key: MAPREDUCE-4611
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4611
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.0-alpha, 0.23.3, 3.0.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


The MR AM always thinks that it is being killed by the RM when it gets a kill 
signal and it has not finished processing yet.  In reality the RM kill signal 
is only sent when the client cannot communicate directly with the AM, which 
probably means that the AM is in a bad state already.  The much more common 
case is that the node is marked as unhealthy or decomissioned.

I propose that in the short term the AM will only clean up if 

 # The process has been asked by the client to exit (kill)
 # The process job has finished cleanly and is exiting already
 # This is that last retry of the AM retries.

The downside here is that the .staging directory will be leaked and the job 
will not show up in the history server on an kill from the RM in some cases.

At least until the full set of AM cleanup issues can be addressed, probably as 
part of MAPREDUCE-4428

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4600) TestTokenCache.java from MRV1 no longer compiles

2012-08-28 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4600:
--

 Summary: TestTokenCache.java from MRV1 no longer compiles
 Key: MAPREDUCE-4600
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4600
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.3, 2.1.0-alpha, 3.0.0, 2.2.0-alpha
Reporter: Robert Joseph Evans
Assignee: Daryn Sharp
Priority: Critical


{noformat}
[javac] hadoop-mapreduce-project/build.xml:569: warning: 
'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to 
false for repeatable builds
[javac] Compiling 95 source files to 
hadoop-mapreduce-project/build/test/mapred/classes
[javac] 
hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/mapreduce/security/TestTokenCache.java:291:
 cannot find symbol
[javac] symbol  : method 
getDelegationToken(org.apache.hadoop.security.Credentials,java.lang.String)
[javac] location: class org.apache.hadoop.mapreduce.security.TokenCache
[javac] TokenDelegationTokenIdentifier nnt = 
TokenCache.getDelegationToken(
[javac]  ^
[javac] 
hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/mapreduce/security/TestTokenCache.java:350:
 cannot find symbol
[javac] symbol  : method getDelegationTokens(java.lang.String)
[javac] location: class org.apache.hadoop.hdfs.HftpFileSystem
[javac]   }}).when(hfs).getDelegationTokens(renewer);
[javac]^
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] 2 errors
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-4539) Please delete me

2012-08-14 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved MAPREDUCE-4539.


Resolution: Duplicate

 Please delete me
 

 Key: MAPREDUCE-4539
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4539
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Trivial

 I am in a bad state will someone please delete me?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-4538) Please delete me

2012-08-14 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved MAPREDUCE-4538.


Resolution: Duplicate

 Please delete me
 

 Key: MAPREDUCE-4538
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4538
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Trivial
 Attachments: MR-4538.txt


 I am in a bad state will someone please delete me.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4549) Distributed cache conflicts breaks backwards compatability

2012-08-13 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4549:
--

 Summary: Distributed cache conflicts breaks backwards compatability
 Key: MAPREDUCE-4549
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4549
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.1.0-alpha, 3.0.0, 2.2.0-alpha
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Critical


I recently put in MAPREDUCE-4503 which went a bit too far, and broke backwards 
compatibility with 1.0 in distribtued cache entries.  This is to change the 
behavior of the distributed cache to more closely match that of 1.0.

In 1.0 when adding in a cache archive link the first link would win (be the one 
that was created), not the last one as is the current behavior, when there were 
conflicts then all of the others were ignored and just did not get a symlink 
created, and finally no symlink was created for archives that had did not have 
a fragment in the URL.  

To simulate this behavior after we parse the cache files and cache archives 
configuration we should walk through all conflicting links and pick the first 
link that has a fragment to win.  If no link has a fragment then it is just the 
first link wins.  All other conflicting links will have a warning an the name 
of the link will be changed to include a UUID.  If the same file is both in the 
distributed cache as a cache file and a cache archive we will throw an 
exception, for backwards compatibility.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4504) SortValidator writes to wrong directory

2012-08-01 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4504:
--

 Summary: SortValidator writes to wrong directory
 Key: MAPREDUCE-4504
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4504
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


SortValidator tries to write to jobConf.get(hadoop.tmp.dir, /tmp), but it 
is not intended to be an HDFS directory. it should just be /tmp.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-3320) Error conditions in web apps should stop pages from rendering.

2012-07-26 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved MAPREDUCE-3320.


   Resolution: Invalid
Fix Version/s: (was: 0.24.0)

The UI actually will do a redirect back to itself with a cookie set indicating 
that an error happened.  This results in the page being redrawn with the error.

 Error conditions in web apps should stop pages from rendering.
 --

 Key: MAPREDUCE-3320
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3320
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0, 0.24.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans

 There are several places in the web apps where an error condition should 
 short circuit the page from rendering, but it does not.  Ideally the web app 
 framework should be extended to support exceptions similar to Jersey that can 
 have an HTTP return code associated with them.  Then all of the places that 
 produce custom error pages can just throw these exceptions instead. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4458) Warn in java.library.path is used

2012-07-18 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4458:
--

 Summary: Warn in java.library.path is used
 Key: MAPREDUCE-4458
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4458
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


If java.library.path is used on the command line for launching an MRAppMaster 
or an MR Task, it could conflict with how standard Hadoop/HDFS JNI libraries 
and dependencies are found.  At a minimum the client should output a warning 
and ask the user to switch to LD_LIBRARY_PATH.  It would be nice to 
automatically do this for them but parsing the command line is scary so just a 
warning is probably good enough for now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4423) Potential infinet fetching of map output

2012-07-10 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4423:
--

 Summary: Potential infinet fetching of map output
 Key: MAPREDUCE-4423
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4423
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 3.0.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


Inside Fetcher.java there are a few cases where an error can happen and the 
corresponding map task is not marked as a fetch failure.  One of these is if 
the Shuffle server returns a malformed result.

MAPREDUCE-3992 makes this case a lot less common, but it is still possible.  IF 
the shuffle handler always returns a malformed result, but a OK response the 
Fetcher will never stop trying to fetch those results. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4373) Fix Javadoc warnings in JobClient.

2012-06-26 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4373:
--

 Summary: Fix Javadoc warnings in JobClient.
 Key: MAPREDUCE-4373
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4373
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.1-alpha, 3.0.0
Reporter: Robert Joseph Evans


It looks like MAPREDUCE-4355 added in two new javadoc warnings.

{code}
[WARNING] 
/home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/JobClient.java:651:
 warning - @param argument jobid is not a parameter name.
[WARNING] 
/home/jenkins/jenkins-slave/workspace/PreCommit-MAPREDUCE-Build/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/JobClient.java:669:
 warning - @param argument jobid is not a parameter name.
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4375) Show Configuration Tracability in MR UI

2012-06-26 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4375:
--

 Summary: Show Configuration Tracability in MR UI
 Key: MAPREDUCE-4375
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4375
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: applicationmaster
Affects Versions: 0.23.3
Reporter: Robert Joseph Evans


Once HADOOP-8525 goes in we should provide a way for the Configuration UI to 
display the traceability information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-4313) TestTokenCache doesn't compile due TokenCache.getDelegationToken compilation error

2012-06-05 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved MAPREDUCE-4313.


   Resolution: Fixed
Fix Version/s: 3.0.0
   2.0.1-alpha

I checked the small fix into branch-2, and trunk.  These are the two places 
that the other change went in that broke this.

 TestTokenCache doesn't compile due TokenCache.getDelegationToken compilation 
 error
 --

 Key: MAPREDUCE-4313
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4313
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build, test
Reporter: Eli Collins
Assignee: Robert Joseph Evans
Priority: Blocker
 Fix For: 2.0.1-alpha, 3.0.0


 Saw this on the trunk Jenkins job:
 {noformat}
 compile-mapred-test:
 [mkdir] Created dir: 
 /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk/trunk/hadoop-mapreduce-project/build/test/mapred/classes
 [mkdir] Created dir: 
 /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk/trunk/hadoop-mapreduce-project/build/test/mapred/testjar
 [mkdir] Created dir: 
 /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk/trunk/hadoop-mapreduce-project/build/test/mapred/testshell
 [javac] Compiling 95 source files to 
 /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk/trunk/hadoop-mapreduce-project/build/test/mapred/classes
 [javac] 
 /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk/trunk/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/mapreduce/security/TestTokenCache.java:292:
  incompatible types
 [javac] found   : org.apache.hadoop.security.token.Tokencapture#315 of ?
 [javac] required: 
 org.apache.hadoop.security.token.Tokenorg.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier
 [javac] TokenDelegationTokenIdentifier nnt = 
 TokenCache.getDelegationToken(
 [javac]   
   ^
 [javac] Note: Some input files use or override a deprecated API.
 [javac] Note: Recompile with -Xlint:deprecation for details.
 [javac] 1 error
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4301) Dedupe some strings in MRAM for memory savings

2012-06-01 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4301:
--

 Summary: Dedupe some strings in MRAM for memory savings
 Key: MAPREDUCE-4301
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4301
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: applicationmaster
Affects Versions: 2.0.0-alpha, 0.23.3
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


Recently an OutOfMemoryError caused one of our jobs to become a zombie 
(MAPREDUCE-4300).  It was a rather large job with 78000+ map tasks and only 
750MB of heap configured.  I took a heap dump to see if there were any obvious 
memory leaks, and I could not find any, but yourkit and some digging found some 
potential memory optimizations that we could do.

In this particular case we could save about 20MB if 
SplitMetaInfoReader.readSplitMetaInfo only computed the JobSplitFile once 
instead of for each split. (a 2 line change)

I will look into some others and see if there are more savings I can come up 
with.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4303) Look at using String.intern to dedupe some Strings

2012-06-01 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4303:
--

 Summary: Look at using String.intern to dedupe some Strings
 Key: MAPREDUCE-4303
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4303
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: applicationmaster
Affects Versions: 2.0.0-alpha, 0.23.3
Reporter: Robert Joseph Evans


MAPREDUCE-4301 fixes one issue with too many duplicate strings, but there are 
other places where it is not as simple to remove the duplicates.  In these 
cases the source of the strings is an incoming RPC call or from parsing and 
reading in a file.  The only real way to dedupe these is to either use 
String.intern() which if not used properly could result in the permgen space 
being filled up, or by playing games with our own cache, and trying to do the 
same sort of thing as String.intern, but in the heap.

The following are some that I saw lots of duplicate strings that we should look 
at doing something about.

TaskAttemptStatusUpdateEvent$TaskAttemptState.stateString
MapTaskAttemptImpl.diagnostics
The keys to Counters.groups
GenericGroup.displayName
The keys to GenericGroup.counters
and GenericCounter.displayName

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4300) OOM in AM can turn it into a zombie.

2012-05-31 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4300:
--

 Summary: OOM in AM can turn it into a zombie.
 Key: MAPREDUCE-4300
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4300
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 0.23.3
Reporter: Robert Joseph Evans


It looks like 4 threads in the AM died with OOM but not the one pinging the RM.

stderr for this AM
{noformat}
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
May 30, 2012 4:49:55 AM 
com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider
 get
WARNING: You are attempting to use a deprecated API (specifically, attempting 
to @Inject ServletContext inside an eagerly created singleton. While we allow 
this for backwards compatibility, be warned that this MAY have unexpected 
behavior if you have more than one injector (with ServletModule) running in the 
same JVM. Please consult the Guice documentation at 
http://code.google.com/p/google-guice/wiki/Servlets for more information.
May 30, 2012 4:49:55 AM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver 
as a provider class
May 30, 2012 4:49:55 AM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a 
provider class
May 30, 2012 4:49:55 AM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a 
root resource class
May 30, 2012 4:49:55 AM 
com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.8 06/24/2011 12:17 PM'
May 30, 2012 4:49:55 AM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory 
getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to 
GuiceManagedComponentProvider with the scope Singleton
May 30, 2012 4:49:56 AM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory 
getComponentProvider
INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to 
GuiceManagedComponentProvider with the scope Singleton
May 30, 2012 4:49:56 AM 
com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory 
getComponentProvider
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to 
GuiceManagedComponentProvider with the scope PerRequest
Exception in thread ResponseProcessor for block 
BP-1114822160-IP-1322528669066:blk_-6528896407411719649_34227308 
java.lang.OutOfMemoryError: Java heap space
at com.google.protobuf.CodedInputStream.(CodedInputStream.java:538)
at 
com.google.protobuf.CodedInputStream.newInstance(CodedInputStream.java:55)
at 
com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:201)
at 
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:738)
at 
org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos$PipelineAckProto.parseFrom(DataTransferProtos.java:7287)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:95)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:656)
Exception in thread DefaultSpeculator background processing 
java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.resize(HashMap.java:462)
at java.util.HashMap.addEntry(HashMap.java:755)
at java.util.HashMap.put(HashMap.java:385)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.getTasks(JobImpl.java:632)
at 
org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.maybeScheduleASpeculation(DefaultSpeculator.java:465)
at 
org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.maybeScheduleAMapSpeculation(DefaultSpeculator.java:433)
at 
org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.computeSpeculations(DefaultSpeculator.java:509)
at 
org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.access$100(DefaultSpeculator.java:56)
at 
org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator$1.run(DefaultSpeculator.java:176)
at java.lang.Thread.run(Thread.java:619)
Exception in thread Timer for 'MRAppMaster' metrics system 
java.lang.OutOfMemoryError: Java heap space
Exception in thread Socket Reader #4 for port 50500 
java.lang.OutOfMemoryError: Java heap space
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Resolved] (MAPREDUCE-4162) Correctly set token service

2012-05-10 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved MAPREDUCE-4162.


   Resolution: Fixed
Fix Version/s: 0.23.3

 Correctly set token service
 ---

 Key: MAPREDUCE-4162
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4162
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: client, mrv2
Affects Versions: 0.23.0, 0.24.0, 2.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Fix For: 0.23.3, 2.0.0, 3.0.0

 Attachments: MAPREDUCE-4162.patch


 Use {{SecurityUtils.setTokenService}} to set token services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4237) TestNodeStatusUpdater can fail if localhost has a domain associated with it

2012-05-09 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4237:
--

 Summary: TestNodeStatusUpdater can fail if localhost has a domain 
associated with it
 Key: MAPREDUCE-4237
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4237
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


On some systems, RHEL where I work, localhost can resolve to 
localhost.localdomain.  TestNodeStatusUpdater can fail because the nodeid 
containes .localdomain which is not expected by the hard coded localhost string.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4233) NPE can happen in RMNMNodeInfo.

2012-05-08 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4233:
--

 Summary: NPE can happen in RMNMNodeInfo.
 Key: MAPREDUCE-4233
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4233
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.3
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Critical


{noformat}
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.RMNMInfo.getLiveNodeManagers(RMNMInfo.java:96)
at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
at 
com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65)
at 
com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216)
at javax.management.StandardMBean.getAttribute(StandardMBean.java:358)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666)
{noformat}

Looks like rmcontext.getRMNodes() is not kept in sync with 
scheduler.getNodeReport(), so that the report can be null even though the 
context still knowns about the node.

The simple fix is to add in a null check.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (MAPREDUCE-4162) Correctly set token service

2012-05-08 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans reopened MAPREDUCE-4162:



Looks like even though the patch applies cleanly to branch-0.23 it is missing a 
dependency.  I am reverting the changes, just to branch-0.23 until the 
dependency can be addressed.

 Correctly set token service
 ---

 Key: MAPREDUCE-4162
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4162
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: client, mrv2
Affects Versions: 0.23.0, 0.24.0, 2.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Fix For: 2.0.0, 3.0.0

 Attachments: MAPREDUCE-4162.patch


 Use {{SecurityUtils.setTokenService}} to set token services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-4208) The job is hanging up but never continuing until you kill the child process

2012-05-04 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved MAPREDUCE-4208.


Resolution: Not A Problem

 The job is hanging up but never continuing until you kill the child process 
 

 Key: MAPREDUCE-4208
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4208
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: Hadoop 0.20.203.0
 Hbase 0.90.3
 Hive 0.80.1
Reporter: ccw

 I use the hive MR query on hbase,but the job is never end.
 The job is hanging but never continuing util you kill the child process 
 2012-04-28 18:22:33,661 Stage-1 map = 0%,  reduce = 0%
 2012-04-28 18:22:59,760 Stage-1 map = 25%,  reduce = 0%
 2012-04-28 18:23:04,782 Stage-1 map = 38%,  reduce = 0%
 2012-04-28 18:23:07,796 Stage-1 map = 50%,  reduce = 0%
 2012-04-28 18:23:08,801 Stage-1 map = 50%,  reduce = 8%
 2012-04-28 18:23:17,839 Stage-1 map = 50%,  reduce = 17%
 2012-04-28 18:23:19,848 Stage-1 map = 63%,  reduce = 17%
 2012-04-28 18:23:32,909 Stage-1 map = 63%,  reduce = 21%
 2012-04-28 18:23:57,017 Stage-1 map = 75%,  reduce = 21%
 2012-04-28 18:24:09,075 Stage-1 map = 75%,  reduce = 25%
 2012-04-28 18:25:09,397 Stage-1 map = 75%,  reduce = 25%
 2012-04-28 18:26:09,688 Stage-1 map = 75%,  reduce = 25%
 2012-04-28 18:27:09,980 Stage-1 map = 75%,  reduce = 25%
 2012-04-28 18:28:10,262 Stage-1 map = 75%,  reduce = 25%
 2012-04-28 18:29:10,522 Stage-1 map = 75%,  reduce = 25%
 2012-04-28 18:30:10,742 Stage-1 map = 75%,  reduce = 25%
 2012-04-28 18:31:10,985 Stage-1 map = 75%,  reduce = 25%
 2012-04-28 18:32:11,238 Stage-1 map = 75%,  reduce = 25%
 2012-04-28 18:33:11,467 Stage-1 map = 75%,  reduce = 25%
 2012-04-28 18:34:11,731 Stage-1 map = 75%,  reduce = 25%
 2012-04-28 18:35:11,968 Stage-1 map = 75%,  reduce = 25%
 2012-04-28 18:36:12,213 Stage-1 map = 75%,  reduce = 25%
 2012-04-28 18:37:12,508 Stage-1 map = 75%,  reduce = 25%
 2012-04-28 18:38:12,747 Stage-1 map = 75%,  reduce = 25%
 2012-04-28 18:39:12,970 Stage-1 map = 75%,  reduce = 25%
 2012-04-28 18:40:13,205 Stage-1 map = 75%,  reduce = 25%
 I checked the TT log,
 2012-04-28 18:31:53,879 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:31:56,883 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:31:59,887 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:32:02,892 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:32:05,897 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:32:08,902 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:32:11,906 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:32:14,910 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:32:17,915 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:32:20,920 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:32:23,924 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:32:26,929 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:32:29,934 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:32:32,938 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:32:35,943 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:32:38,948 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:32:41,953 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:32:44,957 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:32:47,961 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:32:50,966 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:32:53,970 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:32:56,974 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:32:59,979 INFO org.apache.hadoop.mapred.TaskTracker: 
 attempt_201204281725_0002_m_02_0 0.0%
 2012-04-28 18:33:02,983 INFO org.apache.hadoop.mapred.TaskTracker: 
 

[jira] [Resolved] (MAPREDUCE-3958) RM: Remove RMNodeState and replace it with NodeState

2012-05-04 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved MAPREDUCE-3958.


   Resolution: Fixed
Fix Version/s: (was: 0.23.2)
   3.0.0
   2.0.0

Thanks Bikas,

I put this into trunk and branch-2.  +1

 RM: Remove RMNodeState and replace it with NodeState
 

 Key: MAPREDUCE-3958
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3958
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Bikas Saha
Assignee: Bikas Saha
 Fix For: 2.0.0, 3.0.0

 Attachments: MAPREDUCE-3958-1.patch, MAPREDUCE-3958-2.patch, 
 MAPREDUCE-3958-3.patch, MAPREDUCE-3958.patch


 RMNodeState is being sent over the wire after MAPREDUCE-3353. This has been 
 done by cloning the enum into NodeState in yarn protocol records.
 That makes RMNodeState redundant and it should be replaced with NodeState.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3050) YarnScheduler needs to expose Resource Usage Information

2011-09-20 Thread Robert Joseph Evans (JIRA)
YarnScheduler needs to expose Resource Usage Information


 Key: MAPREDUCE-3050
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3050
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.0, 0.24.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Blocker
 Fix For: 0.23.0, 0.24.0


Before the recent refactor The nodes had information in them about how much 
resources they were using.  This information is not hidden inside 
SchedulerNode.  Similarly resource usage information about an application, or 
in aggregate is only available through the Scheduler and there is not interface 
to pull it out.

We need to expose APIs to get Resource and Container information from the 
scheduler, in aggregate across the entire cluster, per application, per node, 
and ideally also per queue if applicable (although there are no JIRAs I am 
aware of that need this right now).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3036) Some of the Resource Manager memory metrics go negative.

2011-09-19 Thread Robert Joseph Evans (JIRA)
Some of the Resource Manager memory metrics go negative.


 Key: MAPREDUCE-3036
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3036
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0, 0.24.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Blocker
 Fix For: 0.23.0, 0.24.0


ReservedGB seems to always be decremented when a container is released, even 
though the container never reserved any memory.
AvailableGB also seems to be able to go negative in a few situations.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3001) Map Reduce JobHistory and AppMaster UI should have ability to display task specific counters.

2011-09-13 Thread Robert Joseph Evans (JIRA)
Map Reduce JobHistory and AppMaster UI should have ability to display task 
specific counters.
-

 Key: MAPREDUCE-3001
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3001
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver, mrv2
Affects Versions: 0.23.0, 0.24.0
Reporter: Robert Joseph Evans
Priority: Minor
 Fix For: 0.23.0, 0.24.0


Map Reduce JobHistory and AppMaster UI should have ability to display task 
specific counters.  I think the best way to do this is to include in the Nav 
Block a task specific section with task links when a task is selected.  
Counters is already set up to deal with a task passed in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3002) Delink History Context from AppContext

2011-09-13 Thread Robert Joseph Evans (JIRA)
Delink History Context from AppContext
--

 Key: MAPREDUCE-3002
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3002
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver, mrv2
Affects Versions: 0.24.0
Reporter: Robert Joseph Evans


Currently the JobHistory Server has a HistoryContext that pretends to be a Map 
Reduce ApplicationMaster's AppContext so that UI pages can be shared between 
the two.  This is not ideal because the UIs have already diverged a lot, and we 
have to translate the native History Server's data into implementations of Job 
to provide the same interface.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (MAPREDUCE-2936) Contrib Raid compilation broken after HDFS-1620

2011-09-12 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans reopened MAPREDUCE-2936:



It looks like HDFS-1620 was just merged to branch-0.23 and needs this fix in it 
now.

 Contrib Raid compilation broken after HDFS-1620
 ---

 Key: MAPREDUCE-2936
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2936
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.24.0

 Attachments: MAPREDUCE-2936-20110906.txt


 After working around MAPREDUCE-2935 by removing TestServiceLevelAuthorization 
 and runing the following:
 At the trunk level: mvn clean install package -Dtar -Pdist 
 -Dmaven.test.skip.exec=true
 In hadoop-mapreduce-project: ant compile-contrib -Dresolvers=internal
 yields 14 errors.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-2572) Throttle the deletion of data from the distributed cache

2011-09-09 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved MAPREDUCE-2572.


Resolution: Duplicate

This is not longer relevant because MRV1 is deprecated.  MAPREDUCE-2969 will do 
the same work for MRV2.  

 Throttle the deletion of data from the distributed cache
 

 Key: MAPREDUCE-2572
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2572
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.23.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Fix For: 0.23.0

 Attachments: MR-2572-trunk-v1.patch, THROTTLING-security-v1.patch


 When deleting entries from the distributed cache we do so in a background 
 thread.  Once the size limit of the distributed cache is reached all unused 
 entries are deleted.  MAPREDUCE-2494 changes this so that entries are deleted 
 in LRU order until the usage falls below a given threshold.  In either of 
 these cases we are periodically flooding a disk with delete requests which 
 can slow down all IO operations to a drive.  It would be better to be able to 
 throttle this deletion so that it is spread out over a longer period of time. 
  This jira is to add in this throttling.
 On investigating it seems much simpler to backport MPAREDUCE-2494 to 20S 
 before implementing this change rather then try to implement it without LRU 
 deletion, because LRU goes a long way towards reducing the load on the disk 
 anyways.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2926) 500 Error in ResourceManager UI

2011-09-02 Thread Robert Joseph Evans (JIRA)
500 Error in ResourceManager UI
---

 Key: MAPREDUCE-2926
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2926
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0, 0.24.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Fix For: 0.23.0, 0.24.0


When accessing the resource manager UI the following is returned
{noformat}
Problem accessing /. Reason:

org.codehaus.jackson.type.JavaType.init(Ljava/lang/Class;)V

Caused by:

java.lang.NoSuchMethodError: 
org.codehaus.jackson.type.JavaType.init(Ljava/lang/Class;)V
at org.codehaus.jackson.map.type.TypeBase.init(TypeBase.java:15)
at org.codehaus.jackson.map.type.SimpleType.init(SimpleType.java:45)
at org.codehaus.jackson.map.type.SimpleType.init(SimpleType.java:40)
at 
org.codehaus.jackson.map.type.TypeBindings.clinit(TypeBindings.java:20)
at 
org.codehaus.jackson.map.type.TypeFactory._fromType(TypeFactory.java:530)
at org.codehaus.jackson.map.type.TypeFactory.type(TypeFactory.java:63)
at org.codehaus.jackson.map.ObjectMapper.clinit(ObjectMapper.java:179)
at org.apache.hadoop.yarn.webapp.Controller.clinit(Controller.java:43)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
com.google.inject.DefaultConstructionProxyFactory$2.newInstance(DefaultConstructionProxyFactory.java:81)
at 
com.google.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
at 
com.google.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:111)
at com.google.inject.InjectorImpl$4$1.call(InjectorImpl.java:758)
at com.google.inject.InjectorImpl.callInContext(InjectorImpl.java:804)
at com.google.inject.InjectorImpl$4.get(InjectorImpl.java:754)
at com.google.inject.InjectorImpl.getInstance(InjectorImpl.java:793)
at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:136)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:216)
at 
com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:141)
at 
com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:93)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:63)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:122)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:110)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:892)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

Powered by Jetty://

{noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2927) CompletedJob.isUber throws a Yarn exception which makes the JobHistory UI unusable.

2011-09-02 Thread Robert Joseph Evans (JIRA)
CompletedJob.isUber throws a Yarn exception which makes the JobHistory UI 
unusable.
---

 Key: MAPREDUCE-2927
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2927
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0, 0.24.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Fix For: 0.23.0, 0.24.0


CompletedJob.isUber on the MR-279 branch returns jobInfo.getIsUber() but got 
turned into an exception when MR-279 was merged to trunk. SVN Revision 1159166.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2913) TestMRJobs.testFailingMapper does not assert the correct thing.

2011-08-30 Thread Robert Joseph Evans (JIRA)
TestMRJobs.testFailingMapper does not assert the correct thing.
---

 Key: MAPREDUCE-2913
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2913
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, test
Affects Versions: 0.23.0, 0.24.0
Reporter: Robert Joseph Evans
 Fix For: 0.23.0, 0.24.0


{code}
Assert.assertEquals(TaskCompletionEvent.Status.FAILED, 
events[0].getStatus().FAILED);
Assert.assertEquals(TaskCompletionEvent.Status.FAILED, 
events[1].getStatus().FAILED);
{code}

when optimized would be

{code}
Assert.assertEquals(TaskCompletionEvent.Status.FAILED, 
TaskCompletionEvent.Status.FAILED);
Assert.assertEquals(TaskCompletionEvent.Status.FAILED, 
TaskCompletionEvent.Status.FAILED);
{code}

obviously these assertions will never fail.  If we remove the 
{code}.FAILED{code} the asserts no longer pass. This could be because MRApp 
mocks out the task launcher and never actually launches anything.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2876) ContainerAllocationExpirer appears to use the incorrect configs

2011-08-24 Thread Robert Joseph Evans (JIRA)
ContainerAllocationExpirer appears to use the incorrect configs
---

 Key: MAPREDUCE-2876
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2876
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Fix For: 0.23.0


ContainerAllocationExpirer sets the expiration interval to be 
RMConfig.CONTAINER_LIVELINESS_MONITORING_INTERVAL but uses 
AMLIVELINESS_MONITORING_INTERVAL as the interval.  This is very different from 
what AMLivelinessMonitor does.

There should be two configs RMConfig.CONTAINER_LIVELINESS_MONITORING_INTERVAL 
for the monitoring interval and RMConfig.CONTAINER_EXPIRY_INTERVAL for the 
expiry.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2865) MRV2 Job.java needs javadocs in it.

2011-08-22 Thread Robert Joseph Evans (JIRA)
MRV2 Job.java needs javadocs in it.
---

 Key: MAPREDUCE-2865
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2865
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Robert Joseph Evans
 Fix For: 0.23.0


This may fall under another JIRA already filed, but Job.java in the MRv2 client 
needs to have javadocs in it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2864) Renaming of configuration property names in yarn

2011-08-19 Thread Robert Joseph Evans (JIRA)
Renaming of configuration property names in yarn


 Key: MAPREDUCE-2864
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2864
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver, mrv2, nodemanager, resourcemanager
Affects Versions: 0.23.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Fix For: 0.23.0


Now that YARN has been put in to trunk we should do something similar to 
MAPREDUCE-849.  We should go back and look at all of the configurations that 
have been added in and rename them as needed to be consistent and subdivided by 
component.

# We should use all lowercase in the config names. e.g., we should use 
appsmanager instead of appsManager etc.
# history server config names should be prefixed with mapreduce instead of yarn.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2772) MR-279: mrv2 no longer compiles against trunk after common mavenization.

2011-08-03 Thread Robert Joseph Evans (JIRA)
MR-279: mrv2 no longer compiles against trunk after common mavenization.


 Key: MAPREDUCE-2772
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2772
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Fix For: 0.23.0
 Attachments: yarn-common-mvn.patch

mrv2 no longer compiles against trunk after common mavenization

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2756) JobControl can drop jobs if an error occurs

2011-07-29 Thread Robert Joseph Evans (JIRA)
JobControl can drop jobs if an error occurs
---

 Key: MAPREDUCE-2756
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2756
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mrv2
Affects Versions: 0.23.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Minor
 Fix For: 0.23.0


If you run a pig job with UDFs that has not been recompiled for MRV2.  There 
are situations where pig will fail with an error message stating that Hadoop 
failed and did not give a reason.  There is even the possibility of deadlock if 
an Error is thrown and the JobControl thread dies.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2723) MR-279: port MAPREDUCE-2324 to mrv2

2011-07-22 Thread Robert Joseph Evans (JIRA)
MR-279: port MAPREDUCE-2324 to mrv2
---

 Key: MAPREDUCE-2723
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2723
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Fix For: 0.23.0


MRV2 currently does not take reduce disk usage into account when trying to 
schedule a container.  For feature parity with the original map reduce it 
should be extended to allow for disk space requests within containers along 
with RAM requests.  We then also need to port MAPREDUCE-2324 to the scheduler 
to allow it to avoid starvation of containers that might never get the 
resources that they need.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-2572) Throttle the deletion of data from the distributed cache

2011-07-22 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved MAPREDUCE-2572.


Resolution: Won't Fix

I filed this and the more I think about it that setting the amount of the 
distributed cache to keep around between cleanings to a high number really 
seems like the best way to deal with this.  Since it is just a configuration 
value there is no need to make any changes to code so I will just close this as 
Won't fix.

 Throttle the deletion of data from the distributed cache
 

 Key: MAPREDUCE-2572
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2572
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.20.205.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Attachments: THROTTLING-security-v1.patch


 When deleting entries from the distributed cache we do so in a background 
 thread.  Once the size limit of the distributed cache is reached all unused 
 entries are deleted.  MAPREDUCE-2494 changes this so that entries are deleted 
 in LRU order until the usage falls below a given threshold.  In either of 
 these cases we are periodically flooding a disk with delete requests which 
 can slow down all IO operations to a drive.  It would be better to be able to 
 throttle this deletion so that it is spread out over a longer period of time. 
  This jira is to add in this throttling.
 On investigating it seems much simpler to backport MPAREDUCE-2494 to 20S 
 before implementing this change rather then try to implement it without LRU 
 deletion, because LRU goes a long way towards reducing the load on the disk 
 anyways.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-2684) Job Tracker can starve reduces with very large input.

2011-07-14 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved MAPREDUCE-2684.


Resolution: Duplicate

 Job Tracker can starve reduces with very large input.
 -

 Key: MAPREDUCE-2684
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2684
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.20.204.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans

 If mapreduce.reduce.input.limit is mis-configured or if a cluster is just 
 running low on disk space in general then reduces with large a input may 
 never get scheduled causing the Job to never fail and never succeed, just 
 starve until the job is killed.
 The JobInProgess tries to guess at the size of the input to all reducers in a 
 job.  If the size is over mapreduce.reduce.input.limit then the job is 
 killed.  If it is not then findNewReduceTask() checks to see if the estimated 
 size is too big to fit on the node currently looking for work.  If it is not 
 then it will let some other task have a chance at the slot.
 The idea is to keep track of how often it happens that a Reduce Slot is 
 rejected because of the lack of space vs how often it succeeds and then guess 
 if the reduce tasks will ever be scheduled.
 So I would like some feedback on this.
 1) How should we guess.  Someone who found the bug here suggested P1 + (P2 * 
 S), where S is the number of successful assignments.  Possibly P1 = 20 and P2 
 = 2.0.  I am not really sure.
 2) What should we do when we guess that it will never get a slot?  Should we 
 fail the job or do we say, even though it might fail, well lets just schedule 
 the it and see if it really will fail.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2672) MR-279: JobHistory Server needs Analysis this job

2011-07-12 Thread Robert Joseph Evans (JIRA)
MR-279: JobHistory Server needs Analysis this job
-

 Key: MAPREDUCE-2672
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2672
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Fix For: 0.23.0


The JobHistory Server needs to implement the Analysis this job functionality 
from the previous server.

This should include the following info
Hadoop Job ID 
User : 
JobName : 
JobConf : 
Submitted At : 
Launched At :  (including duration)
Finished At :  (including duration)
Status :

Time taken by best performing Map task TASK_LINK:
Average time taken by Map tasks:
Worse performing map tasks: (including task links and duration)
The last Map task TASK_LINK finished at (relative to the Job launch time):  
(including duration)

Time taken by best performing shuffle TASK_LINK:
Average time taken by shuffle:
Worse performing Shuffles: (including task links and duration)
The last Shuffle TASK_LINK finished at (relative to the Job launch time):  
(including duration)

Time taken by best performing Reduce task TASK_LINK:
Average time taken by Reduce tasks:
Worse performing reduce tasks: (including task links and duration)
The last Reduce task TASK_LINK finished at (relative to the Job launch time): 
 (including duration)


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2673) MR-279: JobHistory Server should not refresh

2011-07-12 Thread Robert Joseph Evans (JIRA)
MR-279: JobHistory Server should not refresh


 Key: MAPREDUCE-2673
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2673
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Robert Joseph Evans
Priority: Minor
 Fix For: 0.23.0


The Job History Server UI is based off of the Application Master UI, which 
refreshes the page for jobs regularly.  The page should not refresh at all for 
the JobHistroy, because the job has finished and is not changing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2674) MR-279: JobHistory Server should not use tables for layout

2011-07-12 Thread Robert Joseph Evans (JIRA)
MR-279: JobHistory Server should not use tables for layout
--

 Key: MAPREDUCE-2674
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2674
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Robert Joseph Evans
Priority: Minor


The Job History Server web pages use table tags for the layout of the various 
elements on the page.  This is not a very maintainable way of laying out a web 
page.  The ideal is to let CSS do all of the layout and have the document 
itself just have data in it.  This is especially important because there are 
currently no APIs to pull some of this data out, and as such there are tools, 
that scrape these pages.  If we can separate out the layout then even when the 
layout changes the scrapers will not be impacted. 

This should probably be investigated in the rest of the UI too.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2675) MR-279: JobHistory Server main page needs to be reformatted

2011-07-12 Thread Robert Joseph Evans (JIRA)
MR-279: JobHistory Server main page needs to be reformatted
---

 Key: MAPREDUCE-2675
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2675
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


The main page of the Job History Server is based off of the Application Master 
code.  It needs to be reformatted to be more useful and better match what was 
there before.

- The Active Jobs title needs to be replaced with something more appropriate 
(i.e. Retired Jobs)
- The table of jobs should have the following columns in it
  - Submit time, Job Id, Job Name, User and just because I think it would be 
useful state, maps completed, maps failed, reduces completed, reduces failed
- The table needs more advanced filtering, something like 
http://datatables.net/release-datatables/examples/api/multi_filter.html This is 
to match the previous search functionality.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2676) MR-279: JobHistory Job page needs reformatted

2011-07-12 Thread Robert Joseph Evans (JIRA)
MR-279: JobHistory Job page needs reformatted
-

 Key: MAPREDUCE-2676
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2676
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Robert Joseph Evans


The Job page, The Maps page and the Reduces page for the job history server 
needs to be reformatted.

The Job Overview needs to add in the User, a link to the Job Conf, and the Job 
ACLs
It also needs Submitted at, launched at, and finished at, depending on how they 
relates to Started and Elapsed.

In the attempts table we need to remove the new and the running columns
In the tasks table we need to remove progress, pending, and running columns and 
add in a failed count column
We also need to investigate what it would take to add in setup and cleanup 
statistics.  Perhaps these should be more generally Application Master 
statistics and links.

The Maps page and Reduces page should have the progress column removed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2668) MR-279: APPLICATION_STOP is never sent to AuxServices

2011-07-11 Thread Robert Joseph Evans (JIRA)
MR-279: APPLICATION_STOP is never sent to AuxServices
-

 Key: MAPREDUCE-2668
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2668
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Robert Joseph Evans
 Fix For: 0.23.0


APPLICATION_STOP is never sent to the AuxServices only APPLICATION_INIT.  This 
means that all map intermediate data will never be deleted.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2666) MR-279: Need to retrieve shuffle port number on AplicationMaster restart

2011-07-08 Thread Robert Joseph Evans (JIRA)
MR-279: Need to retrieve shuffle port number on AplicationMaster restart


 Key: MAPREDUCE-2666
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2666
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Fix For: 0.23.0


MAPREDUCE-2652 allows ShuffleHandler to return the port it is operating on.  In 
the case of an ApplicationMaster crash where it needs to be restarted that 
information is lost.  We either need to re-query it from each of the 
NodeManagers or to persist it to the JobHistory logs and retrieve it again.  
The job history logs is probably the simpler solution.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2652) MR-279: Cannot run multiple NMs on a single node

2011-07-07 Thread Robert Joseph Evans (JIRA)
MR-279: Cannot run multiple NMs on a single node 
-

 Key: MAPREDUCE-2652
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2652
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


Currently in MR-279 the Auxiliary services, like ShuffleHandler, have no way to 
communicate information back to the applications.  Because of this the Map 
Reduce Application Master has hardcoded in a port of 8080 for shuffle.  This 
prevents the configuration mapreduce.shuffle.port form ever being set to 
anything but 8080.  The code should be updated to allow this information to be 
returned to the application master.  Also the data needs to be persisted to the 
task log so that on restart the data is not lost.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2659) MR-279: ShuffleHandler should use Protocol Buffers for ServiceData

2011-07-07 Thread Robert Joseph Evans (JIRA)
MR-279: ShuffleHandler should use Protocol Buffers for ServiceData
--

 Key: MAPREDUCE-2659
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2659
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Robert Joseph Evans
 Fix For: 0.23.0


Auxiliary Services (Specifically ShuffleHandler) should use ProtocolBuffers for 
storing/retrieving data in the ByteBuffer.  Right now there are TODOs to have 
the format include a version number, but if we want true wire compatibility we 
should use the same system we are using elsewhere in the code for messages, not 
something invented as we go along.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-06-08 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans reopened MAPREDUCE-2494:



Reopening to add in patch for 0.20.2XX branch

 Make the distributed cache delete entires using LRU priority
 

 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch


 Currently the distributed cache will wait until a cache directory is above a 
 preconfigured threshold.  At which point it will delete all entries that are 
 not currently being used.  It seems like we would get far fewer cache misses 
 if we kept some of them around, even when they are not being used.  We should 
 add in a configurable percentage for a goal of how much of the cache should 
 remain clear when not in use, and select objects to delete based off of how 
 recently they were used, and possibly also how large they are/how difficult 
 is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-2572) Throttle the deletion of data from the distributed cache

2011-06-07 Thread Robert Joseph Evans (JIRA)
Throttle the deletion of data from the distributed cache


 Key: MAPREDUCE-2572
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2572
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.20.205.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


When deleting entries from the distributed cache we do so in a background 
thread.  Once the size limit of the distributed cache is reached all unused 
entries are deleted.  MAPREDUCE-2494 changes this so that entries are deleted 
in LRU order until the usage falls below a given threshold.  In either of these 
cases we are periodically flooding a disk with delete requests which can slow 
down all IO operations to a drive.  It would be better to be able to throttle 
this deletion so that it is spread out over a longer period of time.  This jira 
is to add in this throttling.

On investigating it seems much simpler to backport MPAREDUCE-2494 to 20S before 
implementing this change rather then try to implement it without LRU deletion, 
because LRU goes a long way towards reducing the load on the disk anyways.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (MAPREDUCE-2535) JobClient creates a RunningJob with null status and profile

2011-06-06 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans reopened MAPREDUCE-2535:



The fix is good, but it broke the system tests.  Reopening the bug to add in a 
patch to fix the tests.

 JobClient creates a RunningJob with null status and profile
 ---

 Key: MAPREDUCE-2535
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2535
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.20.204.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Attachments: MR-2535-0.20.20X-V1.patch


 Exception occurred because the job was retired and is removed from 
 RetireJobCcahe and CompletedJobStatusStore. But, the
 JobClient creates a RunningJob with null status and profile, if getJob(JobID) 
 is called again.
 So, Even-though not null check is there in the following user code, it did 
 not help.
 466 runningJob = jobClient.getJob(mapRedJobID);
 467 if(runningJob != null) {
 JobClient.getJob() should return null if status is null.
 In trunk this is fixed by validating that the job status is not null every 
 time it is updated, and also verifying that that the profile data is not null 
 when created.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-2539) NPE when calling JobClient.getMapTaskReports for retired job

2011-05-27 Thread Robert Joseph Evans (JIRA)
NPE when calling JobClient.getMapTaskReports for retired job


 Key: MAPREDUCE-2539
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2539
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


When calling JobClient.getMapTaskReports for a retired job this results in a 
NPE.  In the 0.20.* version an empty TaskReport array was returned instead.

Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.mapred.JobClient.getMapTaskReports(JobClient.java:588)
at 
org.apache.pig.tools.pigstats.JobStats.addMapReduceStatistics(JobStats.java:388)
..

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-05-13 Thread Robert Joseph Evans (JIRA)
Make the distributed cache delete entires using LRU priority


 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


Currently the distributed cache will wait until a cache directory is above a 
preconfigured threshold.  At which point it will delete all entries that are 
not currently being used.  It seems like we would get far fewer cache misses if 
we kept some of them around, even when they are not being used.  We should add 
in a configurable percentage for a goal of how much of the cache should remain 
clear when not in use, and select objects to delete based off of how recently 
they were used, and possibly also how large they are/how difficult is it to 
download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has dies for some reason

2011-05-13 Thread Robert Joseph Evans (JIRA)
The distributed cache cleanup thread has no monitoring to check to see if it 
has dies for some reason
-

 Key: MAPREDUCE-2495
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Minor


The cleanup thread in the distributed cache handles IOExceptions and the like 
correctly, but just to be a bit more defensive it would be good to monitor the 
thread, and check that it is still alive regularly, so that the distributed 
cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-2479) Backport MAPREDUCE-1568 to hadoop security branch

2011-05-09 Thread Robert Joseph Evans (JIRA)
Backport MAPREDUCE-1568 to hadoop security branch
-

 Key: MAPREDUCE-2479
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2479
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira