[jira] [Resolved] (MAPREDUCE-3048) Fix test-patch to run tests via mvn clean install test

2011-09-22 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved MAPREDUCE-3048.


Resolution: Fixed
  Assignee: Vinod Kumar Vavilapalli

I don't see any objections to this, but this is really needed and useful.

I just committed this to trunk and branch-0.23.

 Fix test-patch to run tests via mvn clean install test
 

 Key: MAPREDUCE-3048
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3048
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.0

 Attachments: MAPREDUCE-3048.txt


 Some tests like the ones failing at MAPREDUCE-3040 depend on the generated 
 jars. TestMRJobs for e.g. won't run if we simply run mvn clean test.
 I propose that we change test-patch to run tests using mvn clean install 
 test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Hadoop-Mapreduce-trunk-Commit - Build # 954 - Still Failing

2011-09-22 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/954/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 14024 lines...]
[junit] Running org.apache.hadoop.mapred.TestMapRed
[junit] Tests run: 5, Failures: 2, Errors: 3, Time elapsed: 1.319 sec
[junit] Test org.apache.hadoop.mapred.TestMapRed FAILED
[junit] Running org.apache.hadoop.mapred.TestMiniMRDFSCaching
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 32.517 sec
[junit] Running org.apache.hadoop.mapred.TestQueueAclsForCurrentUser
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.576 sec
[junit] Running org.apache.hadoop.mapred.TestRackAwareTaskPlacement
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.479 sec
[junit] Running org.apache.hadoop.mapred.TestReduceFetchFromPartialMem
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 31.85 sec
[junit] Running org.apache.hadoop.mapred.TestReduceTask
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.619 sec
[junit] Running org.apache.hadoop.mapred.TestSequenceFileAsBinaryInputFormat
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.761 sec
[junit] Running 
org.apache.hadoop.mapred.TestSequenceFileAsBinaryOutputFormat
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.023 sec
[junit] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 6.201 sec
[junit] Running org.apache.hadoop.mapred.TestSeveral
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 42.722 sec
[junit] Running org.apache.hadoop.mapred.TestSpeculativeExecution
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 4.016 sec
[junit] Running org.apache.hadoop.mapred.TestTaskLimits
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 3.889 sec
[junit] Running org.apache.hadoop.mapred.TestTaskTrackerBlacklisting
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 1.716 sec
[junit] Running org.apache.hadoop.mapred.TestTextInputFormat
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 56.338 sec
[junit] Running org.apache.hadoop.mapred.TestTextOutputFormat
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.18 sec
[junit] Running org.apache.hadoop.mapred.TestTrackerBlacklistAcrossJobs
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 47.014 sec
[junit] Running org.apache.hadoop.mapreduce.TestCounters
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.35 sec
[junit] Running org.apache.hadoop.mapreduce.TestMapCollection
[junit] Tests run: 11, Failures: 0, Errors: 11, Time elapsed: 0.62 sec
[junit] Test org.apache.hadoop.mapreduce.TestMapCollection FAILED
[junit] Running org.apache.hadoop.mapreduce.TestMapReduceLocal
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 28.319 sec
[junit] Running org.apache.hadoop.mapreduce.lib.input.TestFileInputFormat
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.946 sec
[junit] Running 
org.apache.hadoop.mapreduce.lib.output.TestFileOutputCommitter
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.585 sec

checkfailure:
[touch] Creating 
/home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build/test/testsfailed

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:792:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:755:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:816:
 Tests failed!

Total time: 5 minutes 58 seconds
Build step 'Execute shell' marked build as failure
Recording test results
Updating MAPREDUCE-3048
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed


Hadoop-Mapreduce-trunk-Commit - Build # 955 - Still Failing

2011-09-22 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/955/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 14024 lines...]
[junit] Running org.apache.hadoop.mapred.TestMapRed
[junit] Tests run: 5, Failures: 2, Errors: 3, Time elapsed: 1.316 sec
[junit] Test org.apache.hadoop.mapred.TestMapRed FAILED
[junit] Running org.apache.hadoop.mapred.TestMiniMRDFSCaching
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 29.841 sec
[junit] Running org.apache.hadoop.mapred.TestQueueAclsForCurrentUser
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.585 sec
[junit] Running org.apache.hadoop.mapred.TestRackAwareTaskPlacement
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.488 sec
[junit] Running org.apache.hadoop.mapred.TestReduceFetchFromPartialMem
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 38.045 sec
[junit] Running org.apache.hadoop.mapred.TestReduceTask
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.616 sec
[junit] Running org.apache.hadoop.mapred.TestSequenceFileAsBinaryInputFormat
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.77 sec
[junit] Running 
org.apache.hadoop.mapred.TestSequenceFileAsBinaryOutputFormat
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.941 sec
[junit] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.729 sec
[junit] Running org.apache.hadoop.mapred.TestSeveral
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 42.979 sec
[junit] Running org.apache.hadoop.mapred.TestSpeculativeExecution
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 4.203 sec
[junit] Running org.apache.hadoop.mapred.TestTaskLimits
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 3.888 sec
[junit] Running org.apache.hadoop.mapred.TestTaskTrackerBlacklisting
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 1.699 sec
[junit] Running org.apache.hadoop.mapred.TestTextInputFormat
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 92.938 sec
[junit] Running org.apache.hadoop.mapred.TestTextOutputFormat
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.178 sec
[junit] Running org.apache.hadoop.mapred.TestTrackerBlacklistAcrossJobs
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 47.112 sec
[junit] Running org.apache.hadoop.mapreduce.TestCounters
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.347 sec
[junit] Running org.apache.hadoop.mapreduce.TestMapCollection
[junit] Tests run: 11, Failures: 0, Errors: 11, Time elapsed: 0.614 sec
[junit] Test org.apache.hadoop.mapreduce.TestMapCollection FAILED
[junit] Running org.apache.hadoop.mapreduce.TestMapReduceLocal
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 28.134 sec
[junit] Running org.apache.hadoop.mapreduce.lib.input.TestFileInputFormat
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.944 sec
[junit] Running 
org.apache.hadoop.mapreduce.lib.output.TestFileOutputCommitter
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.585 sec

checkfailure:
[touch] Creating 
/home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build/test/testsfailed

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:792:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:755:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:816:
 Tests failed!

Total time: 6 minutes 34 seconds
Build step 'Execute shell' marked build as failure
Recording test results
Updating HDFS-46
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed


[jira] [Created] (MAPREDUCE-3069) running test-patch gives me 10 warnings for missing 'build.plugins.plugin.version' of org.apache.rat:apache-rat-plugin

2011-09-22 Thread Ravi Prakash (JIRA)
running test-patch gives me 10 warnings for missing 
'build.plugins.plugin.version' of org.apache.rat:apache-rat-plugin
--

 Key: MAPREDUCE-3069
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3069
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Fix For: 0.23.0


apache-rat-plugin doesn't have a version specified in hadoop-mapreduce-project 
and hadoop-yarn pom.xml files





--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3071) app master configuration web UI link under the Job menu opens up application menu

2011-09-22 Thread Thomas Graves (JIRA)
app master configuration web UI link under the Job menu opens up application 
menu
-

 Key: MAPREDUCE-3071
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3071
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Thomas Graves
 Fix For: 0.23.0


If you go to the app master web UI for a particular job. The job menu on the 
left side displays links for overview, counters, configuration, etc..

If you click on the configuration one, it closes the job menu and opens the 
application menu on that left side. It shouldn't do this. It should leave the 
job menu open.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Hadoop-Mapreduce-trunk-Commit - Build # 956 - Still Failing

2011-09-22 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/956/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 14024 lines...]
[junit] Running org.apache.hadoop.mapred.TestMapRed
[junit] Tests run: 5, Failures: 2, Errors: 3, Time elapsed: 1.408 sec
[junit] Test org.apache.hadoop.mapred.TestMapRed FAILED
[junit] Running org.apache.hadoop.mapred.TestMiniMRDFSCaching
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 29.5 sec
[junit] Running org.apache.hadoop.mapred.TestQueueAclsForCurrentUser
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.585 sec
[junit] Running org.apache.hadoop.mapred.TestRackAwareTaskPlacement
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.522 sec
[junit] Running org.apache.hadoop.mapred.TestReduceFetchFromPartialMem
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 32.15 sec
[junit] Running org.apache.hadoop.mapred.TestReduceTask
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.619 sec
[junit] Running org.apache.hadoop.mapred.TestSequenceFileAsBinaryInputFormat
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.762 sec
[junit] Running 
org.apache.hadoop.mapred.TestSequenceFileAsBinaryOutputFormat
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.999 sec
[junit] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.114 sec
[junit] Running org.apache.hadoop.mapred.TestSeveral
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 43.656 sec
[junit] Running org.apache.hadoop.mapred.TestSpeculativeExecution
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 4.17 sec
[junit] Running org.apache.hadoop.mapred.TestTaskLimits
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 3.885 sec
[junit] Running org.apache.hadoop.mapred.TestTaskTrackerBlacklisting
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 1.714 sec
[junit] Running org.apache.hadoop.mapred.TestTextInputFormat
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 51.031 sec
[junit] Running org.apache.hadoop.mapred.TestTextOutputFormat
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.181 sec
[junit] Running org.apache.hadoop.mapred.TestTrackerBlacklistAcrossJobs
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 47.079 sec
[junit] Running org.apache.hadoop.mapreduce.TestCounters
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.347 sec
[junit] Running org.apache.hadoop.mapreduce.TestMapCollection
[junit] Tests run: 11, Failures: 0, Errors: 11, Time elapsed: 0.621 sec
[junit] Test org.apache.hadoop.mapreduce.TestMapCollection FAILED
[junit] Running org.apache.hadoop.mapreduce.TestMapReduceLocal
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 28.316 sec
[junit] Running org.apache.hadoop.mapreduce.lib.input.TestFileInputFormat
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.943 sec
[junit] Running 
org.apache.hadoop.mapreduce.lib.output.TestFileOutputCommitter
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.583 sec

checkfailure:
[touch] Creating 
/home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build/test/testsfailed

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:792:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:755:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:816:
 Tests failed!

Total time: 6 minutes 0 seconds
Build step 'Execute shell' marked build as failure
Recording test results
Updating MAPREDUCE-2754
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed


[jira] [Resolved] (MAPREDUCE-3065) ApplicationMaster killed by NodeManager due to excessive virtual memory consumption

2011-09-22 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved MAPREDUCE-3065.


Resolution: Fixed

Per vinod, new ticket to track this: 
https://issues.apache.org/jira/browse/MAPREDUCE-3068

 ApplicationMaster killed by NodeManager due to excessive virtual memory 
 consumption
 ---

 Key: MAPREDUCE-3065
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3065
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.24.0
Reporter: Chris Riccomini

  Hey Vinod,
  
  OK, so I have a little more clarity into this.
  
  When I bump my resource request for my AM to 4096, it runs. The important 
  line in the NM logs is:
  
  2011-09-21 13:43:44,366 INFO  monitor.ContainersMonitorImpl 
  (ContainersMonitorImpl.java:run(402)) - Memory usage of ProcessTree 25656 
  for container-id container_1316637655278_0001_01_01 : Virtual 
  2260938752 bytes, limit : 4294967296 bytes; Physical 120860672 bytes, limit 
  -1 bytes
  
  The thing to note is the virtual memory, which is off the charts, even 
  though my physical memory is almost nothing (12 megs). I'm still poking 
  around the code, but I am noticing that there are two checks in the NM, one 
  for virtual mem, and one for physical mem. The virtual memory check appears 
  to be toggle-able, but is presumably defaulted to on.
  
  At this point I'm trying to figure out exactly what the VMEM check is for, 
  why YARN thinks my app is taking 2 gigs, and how to fix this.
  
  Cheers,
  Chris
  
  From: Chris Riccomini [criccom...@linkedin.com]
  Sent: Wednesday, September 21, 2011 1:42 PM
  To: mapreduce-dev@hadoop.apache.org
  Subject: Re: ApplicationMaster Memory Usage
  
  For the record, I bumped to 4096 for memory resource request, and it works.
  :(
  
  
  On 9/21/11 1:32 PM, Chris Riccomini criccom...@linkedin.com wrote:
  
  Hey Vinod,
  
  So, I ran my application master directly from the CLI. I commented out the
  YARN-specific code. It runs fine without leaking memory.
  
  I then ran it from YARN, with all YARN-specific code commented it. It again
  ran fine.
  
  I then uncommented JUST my registerWithResourceManager call. It then fails
  with OOM after a few seconds. I call registerWithResourceManager, and then 
  go
  into a while(true) { println(yeh) sleep(1000) }. Doing this prints:
  
  yeh
  yeh
  yeh
  yeh
  yeh
  
  At which point, it dies, and, in the NodeManager,I see:
  
  2011-09-21 13:24:51,036 WARN  monitor.ContainersMonitorImpl
  (ContainersMonitorImpl.java:isProcessTreeOverLimit(289)) - Process tree for
  container: container_1316626117280_0005_01_01 has processes older than 
  1
  iteration running over the configured limit. Limit=2147483648, current 
  usage =
  2192773120
  2011-09-21 13:24:51,037 WARN  monitor.ContainersMonitorImpl
  (ContainersMonitorImpl.java:run(453)) - Container
  [pid=23852,containerID=container_1316626117280_0005_01_01] is running
  beyond memory-limits. Current usage : 2192773120bytes. Limit :
  2147483648bytes. Killing container.
  Dump of the process-tree for container_1316626117280_0005_01_01 :
  |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
  SYSTEM_TIME(MILLIS)
  VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
  |- 23852 20570 23852 23852 (bash) 0 0 108638208 303 /bin/bash -c java 
  -Xmx512M
  -cp './package/*' kafka.yarn.ApplicationMaster
  /home/criccomi/git/kafka-yarn/dist/kafka-streamer.tgz 5 1 1316626117280
  com.linkedin.TODO 1
  1/tmp/logs/application_1316626117280_0005/container_1316626117280_0005_01_000
  001/stdout
  2/tmp/logs/application_1316626117280_0005/container_1316626117280_0005_01_000
  001/stderr
  |- 23855 23852 23852 23852 (java) 81 4 2084134912 14772 java -Xmx512M -cp
  ./package/* kafka.yarn.ApplicationMaster
  /home/criccomi/git/kafka-yarn/dist/kafka-streamer.tgz 5 1 1316626117280
  com.linkedin.TODO 1
  2011-09-21 13:24:51,037 INFO  monitor.ContainersMonitorImpl
  (ContainersMonitorImpl.java:run(463)) - Removed ProcessTree with root 23852
  
  Either something is leaking in YARN, or my registerWithResourceManager code
  (see below) is doing something funky.
  
  I'm trying to avoid going through all the pain of attaching a remote 
  debugger.
  Presumably things aren't leaking in YARN, which means it's likely that I'm
  doing something wrong in my registration code.
  
  Incidentally, my NodeManager is running with 1000 megs. My application 
  master
  memory is set to 2048, and my -Xmx setting is 512M
  
  Cheers,
  Chris
  
  From: Vinod Kumar Vavilapalli [vino...@hortonworks.com]
  Sent: Wednesday, September 21, 2011 11:52 AM
  To: 

[jira] [Created] (MAPREDUCE-3072) NodeManager doesn't recognize kill -9 of AM container

2011-09-22 Thread Chris Riccomini (JIRA)
NodeManager doesn't recognize kill -9 of AM container
-

 Key: MAPREDUCE-3072
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3072
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.0
 Environment: [criccomi@criccomi-ld trunk]$ svn info
Path: .
URL: http://svn.apache.org/repos/asf/hadoop/common/trunk
Repository Root: http://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 1174189
Node Kind: directory
Schedule: normal
Last Changed Author: szetszwo
Last Changed Rev: 1173990
Last Changed Date: 2011-09-22 01:25:20 -0700 (Thu, 22 Sep 2011)

Reporter: Chris Riccomini


If I kill -9 my application master's pid, the NM continues reporting that the 
container is running. I assume it should probably instead report back to the RM 
that the AM has died. Instead, it continues sending this status:


2011-09-22 09:33:13,352 INFO  nodemanager.NodeStatusUpdaterImpl 
(NodeStatusUpdaterImpl.java:getNodeStatus(222)) - Sending out status for 
container: container_id {, app_attempt_id {, application_id {, id: 1, 
cluster_timestamp: 1316707951832, }, attemptId: 1, }, id: 1, }, state: 
C_RUNNING, diagnostics: \n, exit_status: -1000, 

2011-09-22 09:33:13,682 INFO  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:run(402)) - Memory usage of ProcessTree 27263 for 
container-id container_1316707951832_0001_01_01 : Virtual 0 bytes, limit : 
2147483648 bytes; Physical 0 bytes, limit -1 bytes

This status keeps being sent forever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-3072) NodeManager doesn't recognize kill -9 of AM container

2011-09-22 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved MAPREDUCE-3072.


Resolution: Duplicate

This is the same as MAPREDUCE-3031.

 NodeManager doesn't recognize kill -9 of AM container
 -

 Key: MAPREDUCE-3072
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3072
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.0
 Environment: [criccomi@criccomi-ld trunk]$ svn info
 Path: .
 URL: http://svn.apache.org/repos/asf/hadoop/common/trunk
 Repository Root: http://svn.apache.org/repos/asf
 Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
 Revision: 1174189
 Node Kind: directory
 Schedule: normal
 Last Changed Author: szetszwo
 Last Changed Rev: 1173990
 Last Changed Date: 2011-09-22 01:25:20 -0700 (Thu, 22 Sep 2011)
Reporter: Chris Riccomini

 If I kill -9 my application master's pid, the NM continues reporting that the 
 container is running. I assume it should probably instead report back to the 
 RM that the AM has died. Instead, it continues sending this status:
 2011-09-22 09:33:13,352 INFO  nodemanager.NodeStatusUpdaterImpl 
 (NodeStatusUpdaterImpl.java:getNodeStatus(222)) - Sending out status for 
 container: container_id {, app_attempt_id {, application_id {, id: 1, 
 cluster_timestamp: 1316707951832, }, attemptId: 1, }, id: 1, }, state: 
 C_RUNNING, diagnostics: \n, exit_status: -1000, 
 2011-09-22 09:33:13,682 INFO  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(402)) - Memory usage of ProcessTree 27263 for 
 container-id container_1316707951832_0001_01_01 : Virtual 0 bytes, limit 
 : 2147483648 bytes; Physical 0 bytes, limit -1 bytes
 This status keeps being sent forever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-2790) [MR-279] Add additional field for storing the AM/job history info on CLI

2011-09-22 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash resolved MAPREDUCE-2790.
-

Resolution: Duplicate

 [MR-279] Add additional field for storing the AM/job history info on CLI
 

 Key: MAPREDUCE-2790
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2790
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Assignee: Ravi Prakash
Priority: Critical
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2790.v1.txt, MAPREDUCE-2790.v2.txt, 
 MAPREDUCE-2790.v3.txt, MAPREDUCE-2790.v4.txt


 bin/mapred job [-list [all]] displays the AM or job history location in the 
 SchedulingInfo field. An additional column has to be added to display the 
 AM/job history information. Currently, the output reads:
 {noformat}
 JobId   State   StartTime   UserNameQueue   Priority
 SchedulingInfo
 jobID  FAILED   0   ramya   default NORMAL  AM 
 information/job history location
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3073) Build failure for MRv1 caused due to changes to MRConstants.

2011-09-22 Thread Mahadev konar (JIRA)
Build failure for MRv1 caused due to changes to MRConstants.


 Key: MAPREDUCE-3073
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3073
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Mahadev konar
Assignee: Mahadev konar
Priority: Blocker
 Fix For: 0.23.0


When runnning ant -Dresolvers=internal binary, the build seems to be failing 
with:

  [javac] public class JobTracker implements MRConstants,
InterTrackerProtocol,
   [javac]^
   [javac] 
/home/y/var/builds/thread2/workspace/Cloud-Yarn-0.23-Secondary/hadoop-mapred
uce-project/src/java/org/apache/hadoop/mapred/TaskTracker.java:131:
interface expected here
   [javac] implements MRConstants, TaskUmbilicalProtocol, Runnable,
TTConfig {
   [javac]^
   [javac] 
/home/y/var/builds/thread2/workspace/Cloud-Yarn-0.23-Secondary/hadoop-mapred
uce-project/src/java/org/apache/hadoop/mapred/TaskTracker.java:552: cannot
find symbol
   [javac] symbol  : variable WORKDIR
   [javac] location: class org.apache.hadoop.mapred.MRConstants
   [javac] return getLocalJobDir(user, jobid) + Path.SEPARATOR +
MRConstants.WORKDIR;
   [javac]
^


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3074) add location to web UI so you know where you are - cluster, node, AM, job history

2011-09-22 Thread Thomas Graves (JIRA)
add location to web UI so you know where you are - cluster, node, AM, job 
history
-

 Key: MAPREDUCE-3074
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3074
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Thomas Graves
 Fix For: 0.23.0


Right now if you go to any of the web UIs for resource manager, node manager, 
app master, or job history, they look very similar but sometimes it hard to 
tell which page you are.  Adding a title or something that lets you know would 
be helpful.   Or somehow make them more seemless so one doesn't have to know.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3075) Web UI menu inconsistencies

2011-09-22 Thread Thomas Graves (JIRA)
Web UI menu inconsistencies
---

 Key: MAPREDUCE-3075
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3075
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Thomas Graves
 Fix For: 0.23.0


When you go to the various web UI's the menus on the left are inconsistent and 
(atleast to me) sometimes confusing.   For instance if you go to the 
application master UI, one of the menus is Cluster. If you click on one of the 
Cluster links it takes you back to the RM ui and you lose the app master UI 
altogether. Maybe its just me but that is confusing.  I like having a link back 
to the cluster from AM but the way the UI is setup I would have expected it to 
just open that page in the middle div/frame and leave the AM menus there.  
Perhaps a different type of link or menu to indicate this is going to take you 
away from AM page.


Also, the nodes and job history UI don't have the Cluster menus at all.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Hadoop-Mapreduce-trunk-Commit - Build # 957 - Still Failing

2011-09-22 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/957/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 14071 lines...]
[junit] Running org.apache.hadoop.mapred.TestMapRed
[junit] Tests run: 5, Failures: 2, Errors: 3, Time elapsed: 1.312 sec
[junit] Test org.apache.hadoop.mapred.TestMapRed FAILED
[junit] Running org.apache.hadoop.mapred.TestMiniMRDFSCaching
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 32.756 sec
[junit] Running org.apache.hadoop.mapred.TestQueueAclsForCurrentUser
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.585 sec
[junit] Running org.apache.hadoop.mapred.TestRackAwareTaskPlacement
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.52 sec
[junit] Running org.apache.hadoop.mapred.TestReduceFetchFromPartialMem
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 36.253 sec
[junit] Running org.apache.hadoop.mapred.TestReduceTask
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.621 sec
[junit] Running org.apache.hadoop.mapred.TestSequenceFileAsBinaryInputFormat
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.771 sec
[junit] Running 
org.apache.hadoop.mapred.TestSequenceFileAsBinaryOutputFormat
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.941 sec
[junit] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.886 sec
[junit] Running org.apache.hadoop.mapred.TestSeveral
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 43.063 sec
[junit] Running org.apache.hadoop.mapred.TestSpeculativeExecution
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 4.157 sec
[junit] Running org.apache.hadoop.mapred.TestTaskLimits
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 3.895 sec
[junit] Running org.apache.hadoop.mapred.TestTaskTrackerBlacklisting
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 1.707 sec
[junit] Running org.apache.hadoop.mapred.TestTextInputFormat
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 56.258 sec
[junit] Running org.apache.hadoop.mapred.TestTextOutputFormat
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.178 sec
[junit] Running org.apache.hadoop.mapred.TestTrackerBlacklistAcrossJobs
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 46.851 sec
[junit] Running org.apache.hadoop.mapreduce.TestCounters
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.344 sec
[junit] Running org.apache.hadoop.mapreduce.TestMapCollection
[junit] Tests run: 11, Failures: 0, Errors: 11, Time elapsed: 0.616 sec
[junit] Test org.apache.hadoop.mapreduce.TestMapCollection FAILED
[junit] Running org.apache.hadoop.mapreduce.TestMapReduceLocal
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 28.336 sec
[junit] Running org.apache.hadoop.mapreduce.lib.input.TestFileInputFormat
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.967 sec
[junit] Running 
org.apache.hadoop.mapreduce.lib.output.TestFileOutputCommitter
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.583 sec

checkfailure:
[touch] Creating 
/home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build/test/testsfailed

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:792:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:755:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/hadoop-mapreduce-project/build.xml:816:
 Tests failed!

Total time: 6 minutes 11 seconds
Build step 'Execute shell' marked build as failure
Recording test results
Updating MAPREDUCE-3073
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed


Matching ResourceRequest to Container

2011-09-22 Thread Chris Riccomini
Hey Guys,

I’m sure there’s a way to do this, but I’m missing it.

If I have an AllocateRequest with multiple ResourceRequests in it (say 2), and 
I get two containers back, how do I map which container was sent back for which 
ResourceRequest?

Thanks!
Chris


Re: Matching ResourceRequest to Container

2011-09-22 Thread Arun C Murthy
You are running into MR-2616.

There is a stale patch, should be easy to fix - I can do it, you are welcome to.

Arun

On Sep 22, 2011, at 1:44 PM, Chris Riccomini wrote:

 Hey Guys,
 
 I’m sure there’s a way to do this, but I’m missing it.
 
 If I have an AllocateRequest with multiple ResourceRequests in it (say 2), 
 and I get two containers back, how do I map which container was sent back for 
 which ResourceRequest?
 
 Thanks!
 Chris



Re: Matching ResourceRequest to Container

2011-09-22 Thread Chris Riccomini
Hey Arun,

MR-2616 looks like a gridmix bug. Are you sure this is the right ticket
number?

Thanks!
Chris


On 9/22/11 2:04 PM, Arun C Murthy a...@hortonworks.com wrote:

 You are running into MR-2616.
 
 There is a stale patch, should be easy to fix - I can do it, you are welcome
 to.
 
 Arun
 
 On Sep 22, 2011, at 1:44 PM, Chris Riccomini wrote:
 
 Hey Guys,
 
 I¹m sure there¹s a way to do this, but I¹m missing it.
 
 If I have an AllocateRequest with multiple ResourceRequests in it (say 2),
 and I get two containers back, how do I map which container was sent back for
 which ResourceRequest?
 
 Thanks!
 Chris
 



Re: Matching ResourceRequest to Container

2011-09-22 Thread Arun C Murthy
Oops, sorry - it's https://issues.apache.org/jira/browse/MAPREDUCE-2646

Arun

On Sep 22, 2011, at 2:11 PM, Chris Riccomini wrote:

 Hey Arun,
 
 MR-2616 looks like a gridmix bug. Are you sure this is the right ticket
 number?
 
 Thanks!
 Chris
 
 
 On 9/22/11 2:04 PM, Arun C Murthy a...@hortonworks.com wrote:
 
 You are running into MR-2616.
 
 There is a stale patch, should be easy to fix - I can do it, you are welcome
 to.
 
 Arun
 
 On Sep 22, 2011, at 1:44 PM, Chris Riccomini wrote:
 
 Hey Guys,
 
 I’m sure there’s a way to do this, but I’m missing it.
 
 If I have an AllocateRequest with multiple ResourceRequests in it (say 2),
 and I get two containers back, how do I map which container was sent back 
 for
 which ResourceRequest?
 
 Thanks!
 Chris
 
 



Re: Matching ResourceRequest to Container

2011-09-22 Thread Arun C Murthy
Also, for now you can assume it's the highest outstanding priority or wait for 
MR-2646.

On Sep 22, 2011, at 1:44 PM, Chris Riccomini wrote:

 Hey Guys,
 
 I’m sure there’s a way to do this, but I’m missing it.
 
 If I have an AllocateRequest with multiple ResourceRequests in it (say 2), 
 and I get two containers back, how do I map which container was sent back for 
 which ResourceRequest?
 
 Thanks!
 Chris



Re: Matching ResourceRequest to Container

2011-09-22 Thread Chris Riccomini
Hey Arun,

I think I see. Basically, 2646's patch is using the priority number as an ID
for a container (or group of containers) within an AllocateRequest.

So, in my scenario, I could set ResourceRequest 1 to priority 1, and
ResourceRequest 2 to priority 2, and (with this patch) get the priority back
out on the other end. Is this correct?

Are priorities cross-application, or just for containers within the
application?

Cheers,
Chris

On 9/22/11 2:13 PM, Arun C Murthy a...@hortonworks.com wrote:

 Also, for now you can assume it's the highest outstanding priority or wait for
 MR-2646.
 
 On Sep 22, 2011, at 1:44 PM, Chris Riccomini wrote:
 
 Hey Guys,
 
 I¹m sure there¹s a way to do this, but I¹m missing it.
 
 If I have an AllocateRequest with multiple ResourceRequests in it (say 2),
 and I get two containers back, how do I map which container was sent back for
 which ResourceRequest?
 
 Thanks!
 Chris
 



Re: Matching ResourceRequest to Container

2011-09-22 Thread Arun C Murthy
Priorities are within you application.

You always get P0 containers before you get P1 containers and so on

Arun

On Sep 22, 2011, at 2:23 PM, Chris Riccomini wrote:

 Hey Arun,
 
 I think I see. Basically, 2646's patch is using the priority number as an ID
 for a container (or group of containers) within an AllocateRequest.
 
 So, in my scenario, I could set ResourceRequest 1 to priority 1, and
 ResourceRequest 2 to priority 2, and (with this patch) get the priority back
 out on the other end. Is this correct?
 
 Are priorities cross-application, or just for containers within the
 application?
 
 Cheers,
 Chris
 
 On 9/22/11 2:13 PM, Arun C Murthy a...@hortonworks.com wrote:
 
 Also, for now you can assume it's the highest outstanding priority or wait 
 for
 MR-2646.
 
 On Sep 22, 2011, at 1:44 PM, Chris Riccomini wrote:
 
 Hey Guys,
 
 I’m sure there’s a way to do this, but I’m missing it.
 
 If I have an AllocateRequest with multiple ResourceRequests in it (say 2),
 and I get two containers back, how do I map which container was sent back 
 for
 which ResourceRequest?
 
 Thanks!
 Chris
 
 



Re: Matching ResourceRequest to Container

2011-09-22 Thread Chris Riccomini
Good deal, thanks.

PS- Haven't forgotten about MALLOC stuff- just lower on priority list at the
moment. More to come.


On 9/22/11 2:32 PM, Arun C Murthy a...@hortonworks.com wrote:

 Priorities are within you application.
 
 You always get P0 containers before you get P1 containers and so on
 
 Arun
 
 On Sep 22, 2011, at 2:23 PM, Chris Riccomini wrote:
 
 Hey Arun,
 
 I think I see. Basically, 2646's patch is using the priority number as an ID
 for a container (or group of containers) within an AllocateRequest.
 
 So, in my scenario, I could set ResourceRequest 1 to priority 1, and
 ResourceRequest 2 to priority 2, and (with this patch) get the priority back
 out on the other end. Is this correct?
 
 Are priorities cross-application, or just for containers within the
 application?
 
 Cheers,
 Chris
 
 On 9/22/11 2:13 PM, Arun C Murthy a...@hortonworks.com wrote:
 
 Also, for now you can assume it's the highest outstanding priority or wait
 for
 MR-2646.
 
 On Sep 22, 2011, at 1:44 PM, Chris Riccomini wrote:
 
 Hey Guys,
 
 I¹m sure there¹s a way to do this, but I¹m missing it.
 
 If I have an AllocateRequest with multiple ResourceRequests in it (say 2),
 and I get two containers back, how do I map which container was sent back
 for
 which ResourceRequest?
 
 Thanks!
 Chris
 
 
 



[jira] [Resolved] (MAPREDUCE-2717) Client should be able to know why an AM crashed.

2011-09-22 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy resolved MAPREDUCE-2717.
--

Resolution: Duplicate
  Assignee: (was: Siddharth Seth)

Most are fixed, now the diagnostics part is dup of MAPREDUCE-3065

 Client should be able to know why an AM crashed.
 

 Key: MAPREDUCE-2717
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2717
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Reporter: Amol Kekre
Priority: Blocker
 Fix For: 0.23.0


 Today if an AM crashes, we have to dig through logs - very cumbersome. It is 
 good to have client print some reason for
 AM crash. Various possible reasons for AM crash:
  (1) AM container failed during localization itself.
  (2) AM container launched but failed before properly starting, for e.g. due 
 to classpath issues
  (3) AM failed after starting properly.
  (4) an AM is expired and killed by the RM
 Potential fixes:
  - For (1) and (2) the client should obtain the container-status, container 
 diagnostics and exit code.
  - For (3), the AM should set some kind of reason for failure during its 
 heartbeat to RM and the client should obtain
 the same from RM.
   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3076) TestSleepJob fails

2011-09-22 Thread Arun C Murthy (JIRA)
TestSleepJob fails 
---

 Key: MAPREDUCE-3076
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3076
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 0.20.205.0
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Priority: Blocker
 Fix For: 0.20.205.0
 Attachments: MAPREDUCE-3076.patch

TestSleepJob fails, it was intended to be used in other tests for 
MAPREDUCE-2981.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3077) re-enable faulty TaskTracker storage without restarting TT, when appropriate

2011-09-22 Thread Matt Foley (JIRA)
re-enable faulty TaskTracker storage without restarting TT, when appropriate


 Key: MAPREDUCE-3077
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3077
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.20.205.0
Reporter: Matt Foley


In MAPREDUCE-2928, Ravi Gummadi proposed:
bq. we can add LocalStorage.checkBadLocalDirs() call to TT.initialize() that 
can do disk-health-check of bad local dirs and add dirs to the good local dirs 
list if they become good.
and Eli Collins added:
bq. Sounds good. Since transient disk failures may cause a file system to 
become read-only (causing permanent failures) sometimes re-mounting is 
sufficient to recover in which case it makes sense to re-enable faulty disks 
w/o TT restart.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [jira] [Created] (MAPREDUCE-3077) re-enable faulty TaskTracker storage without restarting TT, when appropriate

2011-09-22 Thread Koji Noguchi



On 9/22/11 6:19 PM, Matt Foley (JIRA) j...@apache.org wrote:

 re-enable faulty TaskTracker storage without restarting TT, when appropriate
 
 
  Key: MAPREDUCE-3077
  URL: https://issues.apache.org/jira/browse/MAPREDUCE-3077
  Project: Hadoop Map/Reduce
   Issue Type: Improvement
   Components: tasktracker
 Affects Versions: 0.20.205.0
 Reporter: Matt Foley
 
 
 In MAPREDUCE-2928, Ravi Gummadi proposed:
 bq. we can add LocalStorage.checkBadLocalDirs() call to TT.initialize() that
 can do disk-health-check of bad local dirs and add dirs to the good local dirs
 list if they become good.
 and Eli Collins added:
 bq. Sounds good. Since transient disk failures may cause a file system to
 become read-only (causing permanent failures) sometimes re-mounting is
 sufficient to recover in which case it makes sense to re-enable faulty disks
 w/o TT restart.
 
 --
 This message is automatically generated by JIRA.
 For more information on JIRA, see: http://www.atlassian.com/software/jira