date:20130708


 [ 
https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-347:
-

Issue Type: Improvement  (was: Bug)

 YARN node CLI should also show CPU info as memory info in node status
 -

 Key: YARN-347
 URL: https://issues.apache.org/jira/browse/YARN-347
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-347.patch


 With YARN-2 checked in, CPU info are taken into consideration in resource 
 scheduling. yarn node -status NodeID should show CPU used and capacity info 
 as memory info.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-644) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer


[ 
https://issues.apache.org/jira/browse/YARN-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702158#comment-13702158
 ] 

Omkar Vinit Joshi commented on YARN-644:


[~josephkniest] in these scenarios we should simply reject client requests. If 
the client has fabricated the container token or NMToken (YARN-613) then these 
scenarios (NPE) are quite possible. Ideally we should reject them as invalid 
tokens.

 Basic null check is not performed on passed in arguments before using them in 
 ContainerManagerImpl.startContainer
 -

 Key: YARN-644
 URL: https://issues.apache.org/jira/browse/YARN-644
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Priority: Minor

 I see that validation/ null check is not performed on passed in parameters. 
 Ex. tokenId.getContainerID().getApplicationAttemptId() inside 
 ContainerManagerImpl.authorizeRequest()
 I guess we should add these checks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt


 [ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-292:
-

Issue Type: Sub-task  (was: Bug)
Parent: YARN-676

 ResourceManager throws ArrayIndexOutOfBoundsException while handling 
 CONTAINER_ALLOCATED for application attempt
 

 Key: YARN-292
 URL: https://issues.apache.org/jira/browse/YARN-292
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: Devaraj K

 {code:xml}
 2012-12-26 08:41:15,030 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
 Calling allocate on removed or non existant application 
 appattempt_1356385141279_49525_01
 2012-12-26 08:41:15,031 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type CONTAINER_ALLOCATED for applicationAttempt 
 application_1356385141279_49525
 java.lang.ArrayIndexOutOfBoundsException: 0
   at java.util.Arrays$ArrayList.get(Arrays.java:3381)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler

2013-07-08 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702177#comment-13702177
 ] 

Zhijie Shen commented on YARN-502:
--

+1, the patch looks good to me

 RM crash with NPE on NODE_REMOVED event with FairScheduler
 --

 Key: YARN-502
 URL: https://issues.apache.org/jira/browse/YARN-502
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.3-alpha
Reporter: Lohit Vijayarenu
Assignee: Mayank Bansal
 Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch


 While running some test and adding/removing nodes, we see RM crashed with the 
 below exception. We are testing with fair scheduler and running 
 hadoop-2.0.3-alpha
 {noformat}
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node :55680 as it is now LOST
 2013-03-22 18:54:27,015 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 
 Node Transitioned from UNHEALTHY to LOST
 2013-03-22 18:54:27,015 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_REMOVED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375)
 at java.lang.Thread.run(Thread.java:662)
 2013-03-22 18:54:27,016 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped 
 SelectChannelConnector@:50030
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-623) NodeManagers on RM web-app don't have diagnostic information


 [ 
https://issues.apache.org/jira/browse/YARN-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-623:
-

Issue Type: Improvement  (was: Bug)

 NodeManagers on RM web-app don't have diagnostic information
 

 Key: YARN-623
 URL: https://issues.apache.org/jira/browse/YARN-623
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
  Labels: usability

 If RM for some reason asks NMs to shut-down or reboot, it will be very useful 
 to show that information on the UI so that operators can know directly 
 instead of login into machines and looking for logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-397) RM Scheduler api enhancements


 [ 
https://issues.apache.org/jira/browse/YARN-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-397:
-

Issue Type: Improvement  (was: Bug)

 RM Scheduler api enhancements
 -

 Key: YARN-397
 URL: https://issues.apache.org/jira/browse/YARN-397
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Arun C Murthy

 Umbrella jira tracking enhancements to RM apis.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-894) NodeHealthScriptRunner timeout checking is inaccurate on Windows

2013-07-08 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702237#comment-13702237
 ] 

Chris Nauroth commented on YARN-894:


Hi, Chuan.  This patch looks good, but I'm seeing a failure in the test on my 
Windows machine.  If I run just {{TestNodeHealthService#testNodeHealthScript}}, 
then it passes.  If I run the whole {{TestNodeHealthService}} suite, then that 
same test fails with:

{code}
testNodeHealthScript(org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService)
  Time elapsed: 187 sec   ERROR!
java.io.FileNotFoundException: 
C:\hdc\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService-localDir\failingscript.cmd
 (The process cannot ac
cess the file because it is being used by another process)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.init(FileOutputStream.java:194)
at java.io.FileOutputStream.init(FileOutputStream.java:145)
at 
org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService.writeNodeHealthScriptFile(TestNodeHealthService.java:82)
at 
org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService.testNodeHealthScript(TestNodeHealthService.java:154)
{code}

Do you see this happen too?  It's probably a file leak out of the prior test.

 NodeHealthScriptRunner timeout checking is inaccurate on Windows
 

 Key: YARN-894
 URL: https://issues.apache.org/jira/browse/YARN-894
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Attachments: ReadProcessStdout.java, wait.cmd, wait.sh, 
 YARN-894-trunk.patch


 In {{NodeHealthScriptRunner}} method, we will set HealthChecker status based 
 on the Shell execution results. Some status are based on the exception thrown 
 during the Shell script execution.
 Currently, we will catch a non-ExitCodeException from ShellCommandExecutor, 
 and if Shell has the timeout status set at the same time, we will also set 
 HealthChecker status to timeout.
 We have following execution sequence in Shell:
 1) In main thread, schedule a delayed timer task that will kill the original 
 process upon timeout.
 2) In main thread, open a buffered reader and feed in the process's standard 
 input stream.
 3) When timeout happens, the timer task will call {{Process#destroy()}}
  to kill the main process.
 On Linux, when timeout happened and process killed, the buffered reader will 
 thrown an IOException with message: Stream closed in main thread.
 On Windows, we don't have the IOException. Only -1 was returned from the 
 reader that indicates the buffer is finished. As a result, the timeout status 
 is not set on Windows, and {{TestNodeHealthService}} fails on Windows because 
 of this.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-865) RM webservices can't query on application Types


 [ 
https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-865:
-

Issue Type: Improvement  (was: Bug)

 RM webservices can't query on application Types
 ---

 Key: YARN-865
 URL: https://issues.apache.org/jira/browse/YARN-865
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: MR-5337.1.patch


 The resource manager web service api to get the list of apps doesn't have a 
 query parameter for appTypes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-843) TestPipeApplication should not be using AMRMToken.


 [ 
https://issues.apache.org/jira/browse/YARN-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-843:
-

Issue Type: Improvement  (was: Bug)

 TestPipeApplication should not be using AMRMToken.
 --

 Key: YARN-843
 URL: https://issues.apache.org/jira/browse/YARN-843
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Omkar Vinit Joshi

 [YARN-822 comment | 
 https://issues.apache.org/jira/browse/YARN-822?focusedCommentId=13685802page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13685802]
 May be we can just remove the token usage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting


 [ 
https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-807:
-

Issue Type: Improvement  (was: Bug)

 When querying apps by queue, iterating over all apps is inefficient and 
 limiting 
 -

 Key: YARN-807
 URL: https://issues.apache.org/jira/browse/YARN-807
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 The question which apps are in queue x can be asked via the RM REST APIs, 
 through the ClientRMService, and through the command line.  In all these 
 cases, the question is answered by scanning through every RMApp and filtering 
 by the app's queue name.
 All schedulers maintain a mapping of queues to applications.  I think it 
 would make more sense to ask the schedulers which applications are in a given 
 queue. This is what was done in MR1. This would also have the advantage of 
 allowing a parent queue to return all the applications on leaf queues under 
 it, and allow queue name aliases, as in the way that root.default and 
 default refer to the same queue in the fair scheduler.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-115) yarn commands shouldn't add m to the heapsize


 [ 
https://issues.apache.org/jira/browse/YARN-115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-115:
-

Issue Type: Improvement  (was: Bug)

 yarn commands shouldn't add m to the heapsize
 ---

 Key: YARN-115
 URL: https://issues.apache.org/jira/browse/YARN-115
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 0.23.3
Reporter: Thomas Graves
  Labels: usability

 the yarn commands add m to the heapsize. This is unlike the hdfs side and 
 the the old jt/tt used to do.
 JAVA_HEAP_MAX=-Xmx$YARN_RESOURCEMANAGER_HEAPSIZEm
 JAVA_HEAP_MAX=-Xmx$YARN_NODEMANAGER_HEAPSIZEm
 We should not be adding in the m and allow the user to specify units.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-148) CapacityScheduler shouldn't explicitly need YarnConfiguration


 [ 
https://issues.apache.org/jira/browse/YARN-148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-148:
-

Issue Type: Improvement  (was: Bug)

 CapacityScheduler shouldn't explicitly need YarnConfiguration
 -

 Key: YARN-148
 URL: https://issues.apache.org/jira/browse/YARN-148
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 This was done in MAPREDUCE-3773. None of our service APIs warrant 
 YarnConfiguration. We affect the proper loading of yarn-site.xml by 
 explicitly creating YarnConfiguration in all the main classes - 
 ResourceManager, NodeManager etc.
 Due to this extra dependency, tests are failing, see 
 https://builds.apache.org/job/PreCommit-YARN-Build/74//testReport/org.apache.hadoop.yarn.client/TestYarnClient/testClientStop/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-295) Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl

2013-07-08 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702245#comment-13702245
 ] 

Zhijie Shen commented on YARN-295:
--

I agree with moving RMAppAttempt from ALLOCATED to FAILED through 
AMContainerCrashedTransition.

WRT the test, is the following not necessary?

{code}
+launchApplicationAttempt(amContainer);
+runApplicationAttempt(amContainer, host, 8042, oldtrackingurl);
{code}

See testAllocatedToFailed.

 Resource Manager throws InvalidStateTransitonException: Invalid event: 
 CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl
 ---

 Key: YARN-295
 URL: https://issues.apache.org/jira/browse/YARN-295
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-295-trunk-1.patch, YARN-295-trunk-2.patch


 {code:xml}
 2012-12-28 14:03:56,956 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_FINISHED at ALLOCATED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-8) Add more unit tests for CPU scheduling in CS


 [ 
https://issues.apache.org/jira/browse/YARN-8?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-8:
---

Issue Type: Improvement  (was: Bug)

 Add more unit tests for CPU scheduling in CS
 

 Key: YARN-8
 URL: https://issues.apache.org/jira/browse/YARN-8
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 Companion to YARN-2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-601) Refactoring the code which computes the user file cache and user application file cache paths


 [ 
https://issues.apache.org/jira/browse/YARN-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-601:
-

Issue Type: Improvement  (was: Bug)

 Refactoring the code which computes the user file cache and user application 
 file cache paths
 -

 Key: YARN-601
 URL: https://issues.apache.org/jira/browse/YARN-601
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Omkar Vinit Joshi
Priority: Minor

 At present at multiple places user local cache file path and user application 
 file path are getting calculated.
 It is better to expose them as a single static utility method and reuse them 
 everywhere else.
 Locations :
 * ContainerLaunch
 * DefaultContainerExecutor: this already has some methods like this
 * ResourceLocalizationService: getUserCacheFilePath getAppFileCachePath
 * ContainerLocalizer
 * ShuffleHandler.Shuffle
 * TestContainerLocalizer, TestContainerManager, TestDefaultContainerExecutor 
 and TestResourceLocalizationService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-545) NodeResourceMonitor and its Impl are emty and may be removed


 [ 
https://issues.apache.org/jira/browse/YARN-545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-545:
-

Issue Type: Improvement  (was: Bug)

 NodeResourceMonitor and its Impl are emty and may be removed
 

 Key: YARN-545
 URL: https://issues.apache.org/jira/browse/YARN-545
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Priority: Minor



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-100) container-executor should deal with stdout, stderr better


 [ 
https://issues.apache.org/jira/browse/YARN-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-100:
-

Issue Type: Improvement  (was: Bug)

 container-executor should deal with stdout, stderr better
 -

 Key: YARN-100
 URL: https://issues.apache.org/jira/browse/YARN-100
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.1-alpha
Reporter: Colin Patrick McCabe
Priority: Minor

 container-executor.c contains the following code:
 {code}
   fclose(stdin);
   fflush(LOGFILE);
   if (LOGFILE != stdout) {
 fclose(stdout);
   }
   if (ERRORFILE != stderr) {
 fclose(stderr);
   }
   if (chdir(primary_app_dir) != 0) {
 fprintf(LOGFILE, Failed to chdir to app dir - %s\n, strerror(errno));
 return -1;
   }
   execvp(args[0], args);
 {code}
 Whenever you open a new file descriptor, its number is the lowest available 
 number.  So if {{stdout}} (fd number 1) has been closed, and you do 
 open(/my/important/file), you'll get assigned file descriptor 1.  This 
 means that any printf statements in the program will be now printing to 
 /my/important/file.  Oops!
 The correct way to get rid of stdin, stdout, or stderr is not to close them, 
 but to make them point to /dev/null.  {{dup2}} can be used for this purpose.
 It looks like LOGFILE and ERRORFILE are always set to stdout and stderr at 
 the moment.  However, this is a latent bug that should be fixed in case these 
 are ever made configurable (which seems to have been the intent).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (YARN-52) Seed TestYarnClient with tests


 [ 
https://issues.apache.org/jira/browse/YARN-52?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-52.
-

Resolution: Duplicate

Some tests are already added via some recent tickets. Closing this.

 Seed TestYarnClient with tests
 --

 Key: YARN-52
 URL: https://issues.apache.org/jira/browse/YARN-52
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli

 YARN-29 added TestYarnClient with no tests. We need to add client-specific 
 tests validating the client contracts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-766) TestNodeManagerShutdown should use Shell to form the output path


 [ 
https://issues.apache.org/jira/browse/YARN-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-766:
-

Issue Type: Improvement  (was: Bug)

 TestNodeManagerShutdown should use Shell to form the output path
 

 Key: YARN-766
 URL: https://issues.apache.org/jira/browse/YARN-766
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Priority: Minor
 Attachments: YARN-766.branch-2.txt, YARN-766.trunk.txt, YARN-766.txt


 File scriptFile = new File(tmpDir, scriptFile.sh);
 should be replaced with
 File scriptFile = Shell.appendScriptExtension(tmpDir, scriptFile);
 to match trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-245) Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED


[ 
https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702308#comment-13702308
 ] 

Omkar Vinit Joshi commented on YARN-245:


I think this will not fix the root cause. Looking at the current transitions it 
seems that ApplicationImpl got 2 events (APPLICATION_FINISH) when it only 
expects one in its life cycle. The first event made the successful transition 
but second event which in this case occurred at FINISHED state create invalid 
transition. Looking at the code it looks like below code sent two events in 
consecutive loop cycles (node heartbeats)..which caused the problem.. 

[~devaraj.k] is there any way we can reproduce this? did you see that error 
again?

NodeStatusUpdaterImpl.run
{code}
if (appsToCleanup.size() != 0) {
  dispatcher.getEventHandler().handle(
  new CMgrCompletedAppsEvent(appsToCleanup));
}
{code}

[~mayank_bansal] I think we need to fix nodeStatusUpdaterImpl.run code. At 
present it doesn't check if nm received 2 identical responses i.e. NM sent 
heartbeat but didn't get response from rm so it sent the heartbeat again. In 
turn RM sent 2 identical responses. The side effect of this is that NM for 
first response already sent the application finished event... which will create 
problem if it tries to send it again on next identical heartbeat.

{code}
lastHeartBeatID = response.getResponseId();
ListContainerId containersToCleanup = response
.getContainersToCleanup();
if (containersToCleanup.size() != 0) {
  dispatcher.getEventHandler().handle(
  new CMgrCompletedContainersEvent(containersToCleanup, 
  CMgrCompletedContainersEvent.Reason.BY_RESOURCEMANAGER));
}
ListApplicationId appsToCleanup =
response.getApplicationsToCleanup();
//Only start tracking for keepAlive on FINISH_APP
trackAppsForKeepAlive(appsToCleanup);
if (appsToCleanup.size() != 0) {
  dispatcher.getEventHandler().handle(
  new CMgrCompletedAppsEvent(appsToCleanup));
}
{code}

I think we can reproduce this if we send same heartbeat response again which 
includes application finish event. any thoughts?

 Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at 
 FINISHED
 

 Key: YARN-245
 URL: https://issues.apache.org/jira/browse/YARN-245
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-245-trunk-1.patch


 {code:xml}
 2012-11-25 12:56:11,795 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 FINISH_APPLICATION at FINISHED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
 at java.lang.Thread.run(Thread.java:662)
 2012-11-25 12:56:11,796 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1353818859056_0004 transitioned from FINISHED to null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-149) ResourceManager (RM) High-Availability (HA)

2013-07-08 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702365#comment-13702365
 ] 

Alejandro Abdelnur commented on YARN-149:
-

Karthik, high level seems OK to me. One thing I would add, for the HTTP 
failover, if we have the RMHA wrapper approach the wrapper -when in standby- 
would redirect HTTP calls to the active RM. While this does not cover the case 
of rerouting if hitting an RM that crashed, it will cover the common case where 
somebody hits the running standby.

 ResourceManager (RM) High-Availability (HA)
 ---

 Key: YARN-149
 URL: https://issues.apache.org/jira/browse/YARN-149
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Harsh J
Assignee: Bikas Saha
 Attachments: rm-ha-phase1-approach-draft1.pdf


 This jira tracks work needed to be done to support one RM instance failing 
 over to another RM instance so that we can have RM HA. Work includes leader 
 election, transfer of control to leader and client re-direction to new leader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-149) ResourceManager (RM) High-Availability (HA)

2013-07-08 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702378#comment-13702378
 ] 

Bikas Saha commented on YARN-149:
-

Thanks for the notes Karthik. I will go through it and incorporate stuff into 
the final document.

 ResourceManager (RM) High-Availability (HA)
 ---

 Key: YARN-149
 URL: https://issues.apache.org/jira/browse/YARN-149
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Harsh J
Assignee: Bikas Saha
 Attachments: rm-ha-phase1-approach-draft1.pdf


 This jira tracks work needed to be done to support one RM instance failing 
 over to another RM instance so that we can have RM HA. Work includes leader 
 election, transfer of control to leader and client re-direction to new leader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-894) NodeHealthScriptRunner timeout checking is inaccurate on Windows

2013-07-08 Thread Chuan Liu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702402#comment-13702402
]

Chuan Liu commented on YARN-894:

I only saw this once. After I did a clean and rebuild, I could not repro this
error in subsequent runs.

It could be a timing issue. I think the test case had written to the same
script when it failed at {{TestNodeHealthService.java:154}}.

NodeHealthScriptRunner timeout checking is inaccurate on Windows

Key: YARN-894
URL: https://issues.apache.org/jira/browse/YARN-894
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
Attachments: ReadProcessStdout.java, wait.cmd, wait.sh,
YARN-894-trunk.patch

In {{NodeHealthScriptRunner}} method, we will set HealthChecker status based
on the Shell execution results. Some status are based on the exception thrown
during the Shell script execution.
Currently, we will catch a non-ExitCodeException from ShellCommandExecutor,
and if Shell has the timeout status set at the same time, we will also set
HealthChecker status to timeout.
We have following execution sequence in Shell:
1) In main thread, schedule a delayed timer task that will kill the original
process upon timeout.
2) In main thread, open a buffered reader and feed in the process's standard
input stream.
3) When timeout happens, the timer task will call {{Process#destroy()}}
to kill the main process.
On Linux, when timeout happened and process killed, the buffered reader will
thrown an IOException with message: Stream closed in main thread.
On Windows, we don't have the IOException. Only -1 was returned from the
reader that indicates the buffer is finished. As a result, the timeout status
is not set on Windows, and {{TestNodeHealthService}} fails on Windows because
of this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations


 [ 
https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-521:


Attachment: YARN-521-1.patch

 Augment AM - RM client module to be able to request containers only at 
 specific locations
 -

 Key: YARN-521
 URL: https://issues.apache.org/jira/browse/YARN-521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-521-1.patch, YARN-521.patch


 When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to 
 offer an easy way to access their functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations


[ 
https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702424#comment-13702424
 ] 

Sandy Ryza commented on YARN-521:
-

I uploaded a new patch that adds in the checks mentioned by Bikas and 
Alejandro.  I moved the parameter documentation to constructors.  Regarding the 
priority comments, I think the new ones are clearer, but they're out of the 
scope this JIRA so I reverted them and I'll argue somewhere else if I want to 
make that change.  Regarding renaming allRacks to dedupedRacks, the code is 
doing the same thing as it was before for requests with locality relaxed, but I 
had to move some things around to accommodate disabling locality relaxation.

 Augment AM - RM client module to be able to request containers only at 
 specific locations
 -

 Key: YARN-521
 URL: https://issues.apache.org/jira/browse/YARN-521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-521-1.patch, YARN-521.patch


 When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to 
 offer an easy way to access their functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-791) Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API


 [ 
https://issues.apache.org/jira/browse/YARN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-791:


Attachment: YARN-791-8.patch

 Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API
 -

 Key: YARN-791
 URL: https://issues.apache.org/jira/browse/YARN-791
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Blocker
 Attachments: YARN-791-1.patch, YARN-791-2.patch, YARN-791-3.patch, 
 YARN-791-4.patch, YARN-791-5.patch, YARN-791-6.patch, YARN-791-7.patch, 
 YARN-791-8.patch, YARN-791.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-791) Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API


[ 
https://issues.apache.org/jira/browse/YARN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702427#comment-13702427
 ] 

Sandy Ryza commented on YARN-791:
-

Uploaded a patch that updates the CLI documentation.

 Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API
 -

 Key: YARN-791
 URL: https://issues.apache.org/jira/browse/YARN-791
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Blocker
 Attachments: YARN-791-1.patch, YARN-791-2.patch, YARN-791-3.patch, 
 YARN-791-4.patch, YARN-791-5.patch, YARN-791-6.patch, YARN-791-7.patch, 
 YARN-791-8.patch, YARN-791.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations


[ 
https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702432#comment-13702432
 ] 

Hadoop QA commented on YARN-521:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12591273/YARN-521-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1427//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1427//console

This message is automatically generated.

 Augment AM - RM client module to be able to request containers only at 
 specific locations
 -

 Key: YARN-521
 URL: https://issues.apache.org/jira/browse/YARN-521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-521-1.patch, YARN-521.patch


 When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to 
 offer an easy way to access their functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM

2013-07-08 Thread Jian He (JIRA)

[
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702443#comment-13702443
]

Jian He commented on YARN-513:
--

bq.Does ResourceTrackerClientPBImpl still need a close method after this code?
If yes, then why does this code not call close() instead?
These two methods are separate as one for ResourceTrackerPB.class,the other for
ResourceTracker.class. we can also see this from YarnClientImpl.serviceStop and
ApplicationClientProtocolPBClientImpl.close(). Did not investigate into the
deep implementations.
bq. Does this need an @VisibleForTesting flag?
This is actually only overridden in test class not directly called in test case.
bq. Why has this been removed in testConnectionNMToRM? Was this redundant
earlier? Does some new test code cover this check?
I don't see the need of testing this. The following assertion
Assert.assertTrue(NM started before updater
triggered,myUpdater.isTriggered()); covers this I think.
bq. Some of the tests probably dont need to wrap the fake ResourceTracker
inside a RetryProxy.
Only TestNodeStatusUpdaterRetryAndNMShutdown and testNMConnectionToRM are using
the ResourceTracker inside a RetryProxy, which I think should need.

Agreed with other comments.

Create common proxy client for communicating with RM

Key: YARN-513
URL: https://issues.apache.org/jira/browse/YARN-513
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch,
YARN-513.13.patch, YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch,
YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch,
YARN-513.8.patch, YARN-513.9.patch

When the RM is restarting, the NM, AM and Clients should wait for some time
for the RM to come back up.

[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM

2013-07-08 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702452#comment-13702452
 ] 

Jian He commented on YARN-513:
--

bq. Does this need an @VisibleForTesting flag?
Yes, we need to mark as @VisibleForTesting. (disregard my earlier comment)

 Create common proxy client for communicating with RM
 

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, 
 YARN-513.13.patch, YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, 
 YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch, 
 YARN-513.8.patch, YARN-513.9.patch


 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-299) Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE


[ 
https://issues.apache.org/jira/browse/YARN-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702456#comment-13702456
 ] 

Omkar Vinit Joshi commented on YARN-299:


I guess the patch looks good overall .. however we need an additional fix which 
might also occur. The root cause for this is more evident in YARN-820 logs.. 
Container is requesting multiple resources and RESOURCE_LOCALIZED / 
RESOURCE_FAILED events might occur for one more more resources between 
container received first RESOURCE_FAILED event and it deregister itself from 
remaining resources...therefore we might see RESOURCE_FAILED / 
RESOURCE_LOCALIZED events sent to containerImpl when resource is in DONE state 
(for different resources) Therefore like RESOURCE_FAILED we should also 
ignore RESOURCE_LOCALIZED event.
I could see one more issue in the logs... it would be great if we fix that too 
as a part of this jira looks like a quick change... here in LOG.info it is 
calling toString on LocalizedResource which is not threadsafe for ref 
(LinkedList used internally). I guess grabbing writelock inside toString will 
protect it from such exceptions.. we need to check other state machines as well.

{code}
} catch (ExecutionException e) {
  LOG.info(Failed to download rsrc  + assoc.getResource(),
  e.getCause());
  LocalResourceRequest req = assoc.getResource().getRequest();
  publicRsrc.handle(new ResourceFailedLocalizationEvent(req,
  e.getMessage()));
  assoc.getResource().unlock();
{code}

any thoughts?

 Node Manager throws 
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 RESOURCE_FAILED at DONE
 ---

 Key: YARN-299
 URL: https://issues.apache.org/jira/browse/YARN-299
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.0.1-alpha, 2.0.0-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-299-trunk-1.patch


 {code:xml}
 2012-12-31 10:36:27,844 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Can't handle this event at current state: Current: [DONE], eventType: 
 [RESOURCE_FAILED]
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 RESOURCE_FAILED at DONE
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:819)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:71)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:504)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:497)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 2012-12-31 10:36:27,845 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1356792558130_0002_01_01 transitioned from DONE to 
 null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-296) Resource Manager throws InvalidStateTransitonException: Invalid event: APP_ACCEPTED at RUNNING for RMAppImpl

2013-07-08 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702464#comment-13702464
 ] 

Zhijie Shen commented on YARN-296:
--

The patch should work, but IMHO, the essential problem is that APP_ACCEPTED is 
not expected at RUNNING. APP_ACCEPTED is created during ScheduleTransition of a 
RMAppAttempt, and is consumed when a RMApp moves from SUBMITTED to ACCEPTED. 
Only after the RMApp enters ACCEPTED, it can further move to RUNNING (similar 
for UnmanagedAM). Therefore, APP_ACCEPTED shouldn't be seen when the RMApp is 
at RUNNING.

Moreover, it seems impossible that APP_ACCEPTED belongs to the last 
RMAppAttempt if the RMApp is retrying, as retry can only happen after the RMApp 
enters ACCEPTED, where APP_ACCEPTED produced by the last RMAppAttempt has 
already be consumed.

[~devaraj], would you mind post more context around the 
InvalidStateTransitonException, such that we can dig more about the problem?

 Resource Manager throws InvalidStateTransitonException: Invalid event: 
 APP_ACCEPTED at RUNNING for RMAppImpl
 

 Key: YARN-296
 URL: https://issues.apache.org/jira/browse/YARN-296
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-296-trunk-1.patch, YARN-296-trunk-2.patch


 {code:xml}
 2012-12-28 11:14:47,671 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle 
 this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APP_ACCEPTED at RUNNING
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:528)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:72)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:405)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:389)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-727) ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter


 [ 
https://issues.apache.org/jira/browse/YARN-727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-727:
---

Attachment: YARN-727.18.patch

 ClientRMProtocol.getAllApplications should accept ApplicationType as a 
 parameter
 

 Key: YARN-727
 URL: https://issues.apache.org/jira/browse/YARN-727
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Siddharth Seth
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-727.10.patch, YARN-727.11.patch, YARN-727.12.patch, 
 YARN-727.13.patch, YARN-727.14.patch, YARN-727.15.patch, YARN-727.16.patch, 
 YARN-727.17.patch, YARN-727.18.patch, YARN-727.1.patch, YARN-727.2.patch, 
 YARN-727.3.patch, YARN-727.4.patch, YARN-727.5.patch, YARN-727.6.patch, 
 YARN-727.7.patch, YARN-727.8.patch, YARN-727.9.patch


 Now that an ApplicationType is registered on ApplicationSubmission, 
 getAllApplications should be able to use this string to query for a specific 
 application type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-727) ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter


[ 
https://issues.apache.org/jira/browse/YARN-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702511#comment-13702511
 ] 

Xuan Gong commented on YARN-727:


Thanks for the comments. Created a new patch to address the comments

 ClientRMProtocol.getAllApplications should accept ApplicationType as a 
 parameter
 

 Key: YARN-727
 URL: https://issues.apache.org/jira/browse/YARN-727
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Siddharth Seth
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-727.10.patch, YARN-727.11.patch, YARN-727.12.patch, 
 YARN-727.13.patch, YARN-727.14.patch, YARN-727.15.patch, YARN-727.16.patch, 
 YARN-727.17.patch, YARN-727.18.patch, YARN-727.1.patch, YARN-727.2.patch, 
 YARN-727.3.patch, YARN-727.4.patch, YARN-727.5.patch, YARN-727.6.patch, 
 YARN-727.7.patch, YARN-727.8.patch, YARN-727.9.patch


 Now that an ApplicationType is registered on ApplicationSubmission, 
 getAllApplications should be able to use this string to query for a specific 
 application type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-369) Handle ( or throw a proper error when receiving) status updates from application masters that have not registered


 [ 
https://issues.apache.org/jira/browse/YARN-369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-369:
---

Attachment: YARN-369-trunk-4.patch

Thanks [~bikassaha] for the review.

Incorporated all the comments.
Attaching the latest patch.

Thanks,
Mayank

 Handle ( or throw a proper error when receiving) status updates from 
 application masters that have not registered
 -

 Key: YARN-369
 URL: https://issues.apache.org/jira/browse/YARN-369
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.3-alpha, trunk-win
Reporter: Hitesh Shah
Assignee: Mayank Bansal
 Attachments: YARN-369.patch, YARN-369-trunk-1.patch, 
 YARN-369-trunk-2.patch, YARN-369-trunk-3.patch, YARN-369-trunk-4.patch


 Currently, an allocate call from an unregistered application is allowed and 
 the status update for it throws a statemachine error that is silently dropped.
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:588)
at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99)
at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:471)
at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:452)
at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
at java.lang.Thread.run(Thread.java:680)
 ApplicationMasterService should likely throw an appropriate error for 
 applications' requests that should not be handled in such cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-727) ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter


[ 
https://issues.apache.org/jira/browse/YARN-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702514#comment-13702514
 ] 

Xuan Gong commented on YARN-727:


Related MR changes are in https://issues.apache.org/jira/browse/MAPREDUCE-5325
Most of MR changes are for the api rename

 ClientRMProtocol.getAllApplications should accept ApplicationType as a 
 parameter
 

 Key: YARN-727
 URL: https://issues.apache.org/jira/browse/YARN-727
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Siddharth Seth
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-727.10.patch, YARN-727.11.patch, YARN-727.12.patch, 
 YARN-727.13.patch, YARN-727.14.patch, YARN-727.15.patch, YARN-727.16.patch, 
 YARN-727.17.patch, YARN-727.18.patch, YARN-727.1.patch, YARN-727.2.patch, 
 YARN-727.3.patch, YARN-727.4.patch, YARN-727.5.patch, YARN-727.6.patch, 
 YARN-727.7.patch, YARN-727.8.patch, YARN-727.9.patch


 Now that an ApplicationType is registered on ApplicationSubmission, 
 getAllApplications should be able to use this string to query for a specific 
 application type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-295) Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl


 [ 
https://issues.apache.org/jira/browse/YARN-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-295:
---

Attachment: YARN-295-trunk-3.patch

Thanks [~zjshen] for review.

Updated the patch.

Thanks,
Mayank

 Resource Manager throws InvalidStateTransitonException: Invalid event: 
 CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl
 ---

 Key: YARN-295
 URL: https://issues.apache.org/jira/browse/YARN-295
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-295-trunk-1.patch, YARN-295-trunk-2.patch, 
 YARN-295-trunk-3.patch


 {code:xml}
 2012-12-28 14:03:56,956 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_FINISHED at ALLOCATED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-791) Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API


[ 
https://issues.apache.org/jira/browse/YARN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702529#comment-13702529
 ] 

Hadoop QA commented on YARN-791:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12591279/YARN-791-8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1428//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1428//console

This message is automatically generated.

 Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API
 -

 Key: YARN-791
 URL: https://issues.apache.org/jira/browse/YARN-791
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Blocker
 Attachments: YARN-791-1.patch, YARN-791-2.patch, YARN-791-3.patch, 
 YARN-791-4.patch, YARN-791-5.patch, YARN-791-6.patch, YARN-791-7.patch, 
 YARN-791-8.patch, YARN-791.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable

[
https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702532#comment-13702532
]

Omkar Vinit Joshi commented on YARN-814:

* Instead of putting string messages of where this log came from we can use
log4j L option if suitable/required.
[http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/PatternLayout.html]
similarly other LOG.warn places.
{code}
- LOG.warn(Exit code from container is : + exitCode);
+ LOG.warn(Exit code from LinuxContainerExecutor's deleteAsUser is : +
exitCode);
{code}

* I guess it will be more helpful if we add containerId there.. (locId =
containerId)

rest of the patch looks good.

Difficult to diagnose a failed container launch when error due to invalid
environment variable
--

Key: YARN-814
URL: https://issues.apache.org/jira/browse/YARN-814
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Jian He
Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch,
YARN-814.4.patch, YARN-814.patch

The container's launch script sets up environment variables, symlinks etc.
If there is any failure when setting up the basic context ( before the actual
user's process is launched ), nothing is captured by the NM. This makes it
impossible to diagnose the reason for the failure.
To reproduce, set an env var where the value contains characters that throw
syntax errors in bash.

[jira] [Commented] (YARN-791) Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API


[ 
https://issues.apache.org/jira/browse/YARN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702537#comment-13702537
 ] 

Vinod Kumar Vavilapalli commented on YARN-791:
--

The latest patch looks good to me. Checking it in.

Regarding the CLI stuff, can you look at the latest patch at YARN-727? We will 
need to add something like --node-state similar to --app-type there.

 Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API
 -

 Key: YARN-791
 URL: https://issues.apache.org/jira/browse/YARN-791
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Blocker
 Attachments: YARN-791-1.patch, YARN-791-2.patch, YARN-791-3.patch, 
 YARN-791-4.patch, YARN-791-5.patch, YARN-791-6.patch, YARN-791-7.patch, 
 YARN-791-8.patch, YARN-791.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-369) Handle ( or throw a proper error when receiving) status updates from application masters that have not registered


[ 
https://issues.apache.org/jira/browse/YARN-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702540#comment-13702540
 ] 

Hadoop QA commented on YARN-369:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12591295/YARN-369-trunk-4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1429//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1429//console

This message is automatically generated.

 Handle ( or throw a proper error when receiving) status updates from 
 application masters that have not registered
 -

 Key: YARN-369
 URL: https://issues.apache.org/jira/browse/YARN-369
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.3-alpha, trunk-win
Reporter: Hitesh Shah
Assignee: Mayank Bansal
 Attachments: YARN-369.patch, YARN-369-trunk-1.patch, 
 YARN-369-trunk-2.patch, YARN-369-trunk-3.patch, YARN-369-trunk-4.patch


 Currently, an allocate call from an unregistered application is allowed and 
 the status update for it throws a statemachine error that is silently dropped.
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:588)
at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99)
at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:471)
at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:452)
at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
at java.lang.Thread.run(Thread.java:680)
 ApplicationMasterService should likely throw an appropriate error for 
 applications' requests that should not be handled in such cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-727) ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter

2013-07-08 Thread Hitesh Shah (JIRA)

[
https://issues.apache.org/jira/browse/YARN-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702543#comment-13702543
]

Hitesh Shah commented on YARN-727:
--

- Use MRJobConfig.MR_APPLICATION_TYPE instead of MAPREDUCE.
- documentation for ApplicationClientProtocol#getApplications does not seem
to be correct. It does not mention usage in terms of filtering based on
criteria defined in the request object. It mentions that it returns only
running applications - is that correct?
- Likewise in YarnClient. Doc changes.

{code}
+ * Get a report (ApplicationReport) of Applications
+ * about the given applicationTypes in the cluster.
{code}
- Above should be reworded to something along the lines of getting reports of
applications matching the given application types.

- In YarnClientImpl, should code be re-used across the 2 getApplication*
functions?
- why does APP_TYPE_CMD need to be in YarnCLI and not in ApplicationCLI?
- param appTypes - please add more docs for app types in various places
where this is an argument
- In GetApplicationsRequestPBImpl, what happens if setApplicationTypes is
called twice. The first with a non-empty set and the second call with a null.

ClientRMProtocol.getAllApplications should accept ApplicationType as a
parameter

Key: YARN-727
URL: https://issues.apache.org/jira/browse/YARN-727
Project: Hadoop YARN
Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Siddharth Seth
Assignee: Xuan Gong
Priority: Blocker
Attachments: YARN-727.10.patch, YARN-727.11.patch, YARN-727.12.patch,
YARN-727.13.patch, YARN-727.14.patch, YARN-727.15.patch, YARN-727.16.patch,
YARN-727.17.patch, YARN-727.18.patch, YARN-727.1.patch, YARN-727.2.patch,
YARN-727.3.patch, YARN-727.4.patch, YARN-727.5.patch, YARN-727.6.patch,
YARN-727.7.patch, YARN-727.8.patch, YARN-727.9.patch

Now that an ApplicationType is registered on ApplicationSubmission,
getAllApplications should be able to use this string to query for a specific
application type.

[jira] [Commented] (YARN-791) Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API

2013-07-08 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702547#comment-13702547
 ] 

Hudson commented on YARN-791:
-

Integrated in Hadoop-trunk-Commit #4051 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4051/])
YARN-791. Changed RM APIs and web-services related to nodes to ensure that 
both are consistent with each other. Contributed by Sandy Ryza. (Revision 
1500994)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1500994
Files : 
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetClusterNodesRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/NodeCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetClusterNodesRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java


 Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API
 -

 Key: YARN-791
 URL: https://issues.apache.org/jira/browse/YARN-791
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Blocker
 Attachments: YARN-791-1.patch, YARN-791-2.patch, YARN-791-3.patch, 
 YARN-791-4.patch, YARN-791-5.patch, YARN-791-6.patch, YARN-791-7.patch, 
 YARN-791-8.patch, YARN-791.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-791) Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API


[ 
https://issues.apache.org/jira/browse/YARN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702565#comment-13702565
 ] 

Sandy Ryza commented on YARN-791:
-

Thanks Vinod!

I initially tried something similar to what I now see was done in YARN-727, but 
wasn't satisfied with it. When you ask for usage you will get something like
{code}
-listLists all the nodes in the cluster
-status  Prints the status report of the node
-states  A list of states to filter on
{code}
which didn't make sense to me because -states is a parameter to the -list 
command.  Does this seem fine to you?

 Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API
 -

 Key: YARN-791
 URL: https://issues.apache.org/jira/browse/YARN-791
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-791-1.patch, YARN-791-2.patch, YARN-791-3.patch, 
 YARN-791-4.patch, YARN-791-5.patch, YARN-791-6.patch, YARN-791-7.patch, 
 YARN-791-8.patch, YARN-791.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-295) Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl


[ 
https://issues.apache.org/jira/browse/YARN-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702572#comment-13702572
 ] 

Hadoop QA commented on YARN-295:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12591298/YARN-295-trunk-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1430//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1430//console

This message is automatically generated.

 Resource Manager throws InvalidStateTransitonException: Invalid event: 
 CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl
 ---

 Key: YARN-295
 URL: https://issues.apache.org/jira/browse/YARN-295
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-295-trunk-1.patch, YARN-295-trunk-2.patch, 
 YARN-295-trunk-3.patch


 {code:xml}
 2012-12-28 14:03:56,956 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_FINISHED at ALLOCATED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-904) Enable multiple QOP for ResourceManager

2013-07-08 Thread Benoy Antony (JIRA)

Benoy Antony created YARN-904:
-

 Summary: Enable multiple QOP for ResourceManager
 Key: YARN-904
 URL: https://issues.apache.org/jira/browse/YARN-904
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Benoy Antony


Currently ResourceManager supports only single QOP.
The feature makes ResourceManager listen on two ports for RPC. 
One RPC port supports only authentication , other RPC port supports privacy.

Please see HADOOP-9709 for general requirements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-791) Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API


[ 
https://issues.apache.org/jira/browse/YARN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702585#comment-13702585
 ] 

Vinod Kumar Vavilapalli commented on YARN-791:
--

Yeah, I too saw that. Fundamentally the problem is that hadoop's commands don't 
follow standard unix patterns. Ideally we should have list, status, states as 
sub-commands. So you would say yarn application list -all, yarn application 
list -states RUNNING etc. I should have fixed these when i originally added 
the yarn CLI, but may be late now. Should be okay like that or we can 
completely hijack the help message instead of using Apache Common utils.

 Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API
 -

 Key: YARN-791
 URL: https://issues.apache.org/jira/browse/YARN-791
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-791-1.patch, YARN-791-2.patch, YARN-791-3.patch, 
 YARN-791-4.patch, YARN-791-5.patch, YARN-791-6.patch, YARN-791-7.patch, 
 YARN-791-8.patch, YARN-791.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-791) Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API


[ 
https://issues.apache.org/jira/browse/YARN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702595#comment-13702595
 ] 

Sandy Ryza commented on YARN-791:
-

Totally agree.  Filed YARN-905 and we can figure out the best approach there.

 Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API
 -

 Key: YARN-791
 URL: https://issues.apache.org/jira/browse/YARN-791
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-791-1.patch, YARN-791-2.patch, YARN-791-3.patch, 
 YARN-791-4.patch, YARN-791-5.patch, YARN-791-6.patch, YARN-791-7.patch, 
 YARN-791-8.patch, YARN-791.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-905) Add state filters to nodes CLI

Sandy Ryza created YARN-905:
---

 Summary: Add state filters to nodes CLI
 Key: YARN-905
 URL: https://issues.apache.org/jira/browse/YARN-905
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza


It would be helpful for the nodes CLI to have a node-states option that allows 
it to return nodes that are not just in the RUNNING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-905) Add state filters to nodes CLI


 [ 
https://issues.apache.org/jira/browse/YARN-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-905:


Assignee: (was: Sandy Ryza)

 Add state filters to nodes CLI
 --

 Key: YARN-905
 URL: https://issues.apache.org/jira/browse/YARN-905
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza

 It would be helpful for the nodes CLI to have a node-states option that 
 allows it to return nodes that are not just in the RUNNING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-905) Add state filters to nodes CLI

2013-07-08 Thread Wei Yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan reassigned YARN-905:


Assignee: Wei Yan

 Add state filters to nodes CLI
 --

 Key: YARN-905
 URL: https://issues.apache.org/jira/browse/YARN-905
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Wei Yan

 It would be helpful for the nodes CLI to have a node-states option that 
 allows it to return nodes that are not just in the RUNNING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-894) NodeHealthScriptRunner timeout checking is inaccurate on Windows

2013-07-08 Thread Chris Nauroth (JIRA)

[
https://issues.apache.org/jira/browse/YARN-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Nauroth updated YARN-894:
---

Hadoop Flags: Reviewed

+1 for the patch. I'll commit this.

I also cannot repro the problem that I saw earlier. I see no obvious file
handle leaks in the code. If the problem comes back, we can address it
separately.

NodeHealthScriptRunner timeout checking is inaccurate on Windows

[jira] [Commented] (YARN-894) NodeHealthScriptRunner timeout checking is inaccurate on Windows

2013-07-08 Thread Hudson (JIRA)

[
https://issues.apache.org/jira/browse/YARN-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702635#comment-13702635
]

Hudson commented on YARN-894:
-

Integrated in Hadoop-trunk-Commit #4053 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/4053/])
YARN-894. NodeHealthScriptRunner timeout checking is inaccurate on Windows.
Contributed by Chuan Liu. (Revision 1501016)

Result = SUCCESS
cnauroth :
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501016
Files :
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
*
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java
*
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java

NodeHealthScriptRunner timeout checking is inaccurate on Windows

[jira] [Commented] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable


[ 
https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702646#comment-13702646
 ] 

Omkar Vinit Joshi commented on YARN-814:


* yeah... what I am saying is user can enable it if he wants via log4j.

* my bad... in deleteAsUser probably we should simply remove the container 
message all together. Also exitCode check there can be replaced with logging 
exception all the time.

* replace appId with containerId (locId) in startLocalizer.

 Difficult to diagnose a failed container launch when error due to invalid 
 environment variable
 --

 Key: YARN-814
 URL: https://issues.apache.org/jira/browse/YARN-814
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Jian He
 Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, 
 YARN-814.4.patch, YARN-814.patch


 The container's launch script sets up environment variables, symlinks etc. 
 If there is any failure when setting up the basic context ( before the actual 
 user's process is launched ), nothing is captured by the NM. This makes it 
 impossible to diagnose the reason for the failure. 
 To reproduce, set an env var where the value contains characters that throw 
 syntax errors in bash. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-894) NodeHealthScriptRunner timeout checking is inaccurate on Windows

2013-07-08 Thread Chris Nauroth (JIRA)

[
https://issues.apache.org/jira/browse/YARN-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Nauroth updated YARN-894:
---

Component/s: nodemanager

NodeHealthScriptRunner timeout checking is inaccurate on Windows

Key: YARN-894
URL: https://issues.apache.org/jira/browse/YARN-894
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
Fix For: 3.0.0, 2.1.0-beta

Attachments: ReadProcessStdout.java, wait.cmd, wait.sh,
YARN-894-trunk.patch

[jira] [Commented] (YARN-366) Add a tracing async dispatcher to simplify debugging


[ 
https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702689#comment-13702689
 ] 

Sandy Ryza commented on YARN-366:
-

Rebased onto trunk again

 Add a tracing async dispatcher to simplify debugging
 

 Key: YARN-366
 URL: https://issues.apache.org/jira/browse/YARN-366
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366-3.patch, 
 YARN-366-4.patch, YARN-366-5.patch, YARN-366-6.patch, YARN-366.patch


 Exceptions thrown in YARN/MR code with asynchronous event handling do not 
 contain informative stack traces, as all handle() methods sit directly under 
 the dispatcher thread's loop.
 This makes errors very difficult to debug for those who are not intimately 
 familiar with the code, as it is difficult to see which chain of events 
 caused a particular outcome.
 I propose adding an AsyncDispatcher that instruments events with tracing 
 information.  Whenever an event is dispatched during the handling of another 
 event, the dispatcher would annotate that event with a pointer to its parent. 
  When the dispatcher catches an exception, it could reconstruct a stack 
 trace of the chain of events that led to it, and be able to log something 
 informative.
 This would be an experimental feature, off by default, unless extensive 
 testing showed that it did not have a significant performance impact.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-366) Add a tracing async dispatcher to simplify debugging


 [ 
https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-366:


Attachment: YARN-366-6.patch

 Add a tracing async dispatcher to simplify debugging
 

 Key: YARN-366
 URL: https://issues.apache.org/jira/browse/YARN-366
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366-3.patch, 
 YARN-366-4.patch, YARN-366-5.patch, YARN-366-6.patch, YARN-366.patch


 Exceptions thrown in YARN/MR code with asynchronous event handling do not 
 contain informative stack traces, as all handle() methods sit directly under 
 the dispatcher thread's loop.
 This makes errors very difficult to debug for those who are not intimately 
 familiar with the code, as it is difficult to see which chain of events 
 caused a particular outcome.
 I propose adding an AsyncDispatcher that instruments events with tracing 
 information.  Whenever an event is dispatched during the handling of another 
 event, the dispatcher would annotate that event with a pointer to its parent. 
  When the dispatcher catches an exception, it could reconstruct a stack 
 trace of the chain of events that led to it, and be able to log something 
 informative.
 This would be an experimental feature, off by default, unless extensive 
 testing showed that it did not have a significant performance impact.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM


[ 
https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702693#comment-13702693
 ] 

Xuan Gong commented on YARN-763:


bq.On a different note, serviceStop() should not call join() on the heartbeater 
thread. While serviceStop() blocks on the join() it may be holding onto 
application locks in its call tree. The callback thread might be waiting on 
those locks as it upcalls to the app code. Resulting in a deadlock. However, we 
should ensure the JVM is not hung because of any issue on this thread. So we 
should mark the callback thread as a daemon so that the JVM exits even if that 
thread is running.

If we set the callback as daemon thread, calling join() on the heartBeater 
thread will be fine.

 AMRMClientAsync should stop heartbeating after receiving shutdown from RM
 -

 Key: YARN-763
 URL: https://issues.apache.org/jira/browse/YARN-763
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-763.1.patch, YARN-763.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-727) ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter


 [ 
https://issues.apache.org/jira/browse/YARN-727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-727:
---

Attachment: YARN-727.19.patch

 ClientRMProtocol.getAllApplications should accept ApplicationType as a 
 parameter
 

 Key: YARN-727
 URL: https://issues.apache.org/jira/browse/YARN-727
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Siddharth Seth
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-727.10.patch, YARN-727.11.patch, YARN-727.12.patch, 
 YARN-727.13.patch, YARN-727.14.patch, YARN-727.15.patch, YARN-727.16.patch, 
 YARN-727.17.patch, YARN-727.18.patch, YARN-727.19.patch, YARN-727.1.patch, 
 YARN-727.2.patch, YARN-727.3.patch, YARN-727.4.patch, YARN-727.5.patch, 
 YARN-727.6.patch, YARN-727.7.patch, YARN-727.8.patch, YARN-727.9.patch


 Now that an ApplicationType is registered on ApplicationSubmission, 
 getAllApplications should be able to use this string to query for a specific 
 application type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-727) ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter


[ 
https://issues.apache.org/jira/browse/YARN-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702719#comment-13702719
 ] 

Xuan Gong commented on YARN-727:


Recreate the patch based on the latest trunk, and address all the latest 
comments.

 ClientRMProtocol.getAllApplications should accept ApplicationType as a 
 parameter
 

 Key: YARN-727
 URL: https://issues.apache.org/jira/browse/YARN-727
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Siddharth Seth
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-727.10.patch, YARN-727.11.patch, YARN-727.12.patch, 
 YARN-727.13.patch, YARN-727.14.patch, YARN-727.15.patch, YARN-727.16.patch, 
 YARN-727.17.patch, YARN-727.18.patch, YARN-727.19.patch, YARN-727.1.patch, 
 YARN-727.2.patch, YARN-727.3.patch, YARN-727.4.patch, YARN-727.5.patch, 
 YARN-727.6.patch, YARN-727.7.patch, YARN-727.8.patch, YARN-727.9.patch


 Now that an ApplicationType is registered on ApplicationSubmission, 
 getAllApplications should be able to use this string to query for a specific 
 application type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-661) NM fails to cleanup local directories for users


 [ 
https://issues.apache.org/jira/browse/YARN-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-661:
---

Attachment: YARN-661-20130708.patch

 NM fails to cleanup local directories for users
 ---

 Key: YARN-661
 URL: https://issues.apache.org/jira/browse/YARN-661
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta, 0.23.8
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Attachments: YARN-661-20130701.patch, YARN-661-20130708.patch


 YARN-71 added deletion of local directories on startup, but in practice it 
 fails to delete the directories because of permission problems.  The 
 top-level usercache directory is owned by the user but is in a directory that 
 is not writable by the user.  Therefore the deletion of the user's usercache 
 directory, as the user, fails due to lack of permissions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-347) YARN node CLI should also show CPU info as memory info in node status

2013-07-08 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-347:


Attachment: YARN-347-v2.patch

 YARN node CLI should also show CPU info as memory info in node status
 -

 Key: YARN-347
 URL: https://issues.apache.org/jira/browse/YARN-347
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-347.patch, YARN-347-v2.patch


 With YARN-2 checked in, CPU info are taken into consideration in resource 
 scheduling. yarn node -status NodeID should show CPU used and capacity info 
 as memory info.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-347) YARN node CLI should also show CPU info as memory info in node status

2013-07-08 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702723#comment-13702723
 ] 

Junping Du commented on YARN-347:
-

Sure. Rebase this patch against trunk in v2 patch.

 YARN node CLI should also show CPU info as memory info in node status
 -

 Key: YARN-347
 URL: https://issues.apache.org/jira/browse/YARN-347
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-347.patch, YARN-347-v2.patch


 With YARN-2 checked in, CPU info are taken into consideration in resource 
 scheduling. yarn node -status NodeID should show CPU used and capacity info 
 as memory info.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM


 [ 
https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-763:
---

Attachment: YARN-763.3.patch

 AMRMClientAsync should stop heartbeating after receiving shutdown from RM
 -

 Key: YARN-763
 URL: https://issues.apache.org/jira/browse/YARN-763
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-366) Add a tracing async dispatcher to simplify debugging

[
https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702753#comment-13702753
]

Hadoop QA commented on YARN-366:

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12591331/YARN-366-6.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 7 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager

hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch

org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot

org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater

org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync

org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService

org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown

org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor

org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager
org.apache.hadoop.yarn.server.nodemanager.TestEventFlow
org.apache.hadoop.yarn.server.TestContainerManagerSecurity
org.apache.hadoop.yarn.server.TestDiskFailures

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/1431//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1431//console

This message is automatically generated.

Add a tracing async dispatcher to simplify debugging

Key: YARN-366
URL: https://issues.apache.org/jira/browse/YARN-366
Project: Hadoop YARN
Issue Type: New Feature
Components: nodemanager, resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366-3.patch,
YARN-366-4.patch, YARN-366-5.patch, YARN-366-6.patch, YARN-366.patch

Exceptions thrown in YARN/MR code with asynchronous event handling do not
contain informative stack traces, as all handle() methods sit directly under
the dispatcher thread's loop.
This makes errors very difficult to debug for those who are not intimately
familiar with the code, as it is difficult to see which chain of events
caused a particular outcome.
I propose adding an AsyncDispatcher that instruments events with tracing
information. Whenever an event is dispatched during the handling of another
event, the dispatcher would annotate that event with a pointer to its parent.
When the dispatcher catches an exception, it could reconstruct a stack
trace of the chain of events that led to it, and be able to log something
informative.
This would be an experimental feature, off by default, unless extensive
testing showed that it did not have a significant performance impact.

[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM


[ 
https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702780#comment-13702780
 ] 

Hadoop QA commented on YARN-763:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12591349/YARN-763.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  
org.apache.hadoop.yarn.client.api.async.impl.TestAMRMClientAsync

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1432//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1432//console

This message is automatically generated.

 AMRMClientAsync should stop heartbeating after receiving shutdown from RM
 -

 Key: YARN-763
 URL: https://issues.apache.org/jira/browse/YARN-763
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-661) NM fails to cleanup local directories for users


[ 
https://issues.apache.org/jira/browse/YARN-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702788#comment-13702788
 ] 

Hadoop QA commented on YARN-661:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12591342/YARN-661-20130708.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1434//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/1434//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1434//console

This message is automatically generated.

 NM fails to cleanup local directories for users
 ---

 Key: YARN-661
 URL: https://issues.apache.org/jira/browse/YARN-661
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta, 0.23.8
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Attachments: YARN-661-20130701.patch, YARN-661-20130708.patch


 YARN-71 added deletion of local directories on startup, but in practice it 
 fails to delete the directories because of permission problems.  The 
 top-level usercache directory is owned by the user but is in a directory that 
 is not writable by the user.  Therefore the deletion of the user's usercache 
 directory, as the user, fails due to lack of permissions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-347) YARN node CLI should also show CPU info as memory info in node status


[ 
https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702792#comment-13702792
 ] 

Hadoop QA commented on YARN-347:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12591343/YARN-347-v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  org.apache.hadoop.yarn.client.api.impl.TestNMClient

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1435//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1435//console

This message is automatically generated.

 YARN node CLI should also show CPU info as memory info in node status
 -

 Key: YARN-347
 URL: https://issues.apache.org/jira/browse/YARN-347
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-347.patch, YARN-347-v2.patch


 With YARN-2 checked in, CPU info are taken into consideration in resource 
 scheduling. yarn node -status NodeID should show CPU used and capacity info 
 as memory info.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM


 [ 
https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-763:
---

Attachment: YARN-763.4.patch

 AMRMClientAsync should stop heartbeating after receiving shutdown from RM
 -

 Key: YARN-763
 URL: https://issues.apache.org/jira/browse/YARN-763
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch, 
 YARN-763.4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM


 [ 
https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-763:
---

Attachment: YARN-763.5.patch

 AMRMClientAsync should stop heartbeating after receiving shutdown from RM
 -

 Key: YARN-763
 URL: https://issues.apache.org/jira/browse/YARN-763
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch, 
 YARN-763.4.patch, YARN-763.5.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM


[ 
https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702799#comment-13702799
 ] 

Xuan Gong commented on YARN-763:


fix the test case failure

 AMRMClientAsync should stop heartbeating after receiving shutdown from RM
 -

 Key: YARN-763
 URL: https://issues.apache.org/jira/browse/YARN-763
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch, 
 YARN-763.4.patch, YARN-763.5.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-727) ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter


[ 
https://issues.apache.org/jira/browse/YARN-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702812#comment-13702812
 ] 

Hadoop QA commented on YARN-727:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12591339/YARN-727.19.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1433//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1433//console

This message is automatically generated.

 ClientRMProtocol.getAllApplications should accept ApplicationType as a 
 parameter
 

 Key: YARN-727
 URL: https://issues.apache.org/jira/browse/YARN-727
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Siddharth Seth
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-727.10.patch, YARN-727.11.patch, YARN-727.12.patch, 
 YARN-727.13.patch, YARN-727.14.patch, YARN-727.15.patch, YARN-727.16.patch, 
 YARN-727.17.patch, YARN-727.18.patch, YARN-727.19.patch, YARN-727.1.patch, 
 YARN-727.2.patch, YARN-727.3.patch, YARN-727.4.patch, YARN-727.5.patch, 
 YARN-727.6.patch, YARN-727.7.patch, YARN-727.8.patch, YARN-727.9.patch


 Now that an ApplicationType is registered on ApplicationSubmission, 
 getAllApplications should be able to use this string to query for a specific 
 application type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM


[ 
https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702821#comment-13702821
 ] 

Hadoop QA commented on YARN-763:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12591357/YARN-763.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1436//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1436//console

This message is automatically generated.

 AMRMClientAsync should stop heartbeating after receiving shutdown from RM
 -

 Key: YARN-763
 URL: https://issues.apache.org/jira/browse/YARN-763
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch, 
 YARN-763.4.patch, YARN-763.5.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-592) Container logs lost for the application when NM gets restarted

[
https://issues.apache.org/jira/browse/YARN-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702823#comment-13702823
]

Omkar Vinit Joshi commented on YARN-592:

I just looked at your patch I need more information to understand it
better
* are you assuming that after nm restarts application for which containers were
running on that node manager will again get new container on the same node
manager? at present NM doesn't remember the applications which were running on
it across restart. Also RM doesn't inform NM about all the running applications
in the cluster.
* Now across NM restart applications might be still running or it might have
just finished before restart. Do you want to upload the logs for both
scenarios? at present we upload logs only when application finishes...

Container logs lost for the application when NM gets restarted
--

Key: YARN-592
URL: https://issues.apache.org/jira/browse/YARN-592
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Affects Versions: 2.0.1-alpha, 2.0.3-alpha
Reporter: Devaraj K
Assignee: Devaraj K
Priority: Critical
Attachments: YARN-592.patch

While running a big job if the NM goes down due to some reason and comes
back, it will do the log aggregation for the newly launched containers and
deletes all the containers for the application. This case we don't get the
container logs from HDFS or local for the containers which are launched
before restart and completed.

[jira] [Commented] (YARN-644) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer

2013-07-08 Thread Joseph Kniest (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702870#comment-13702870
 ] 

Joseph Kniest commented on YARN-644:


I'm new to mapreduce2/yarn dev and am unfamiliar yet with the source, would 
this simply be exiting the function with a return upon null object detection?

 Basic null check is not performed on passed in arguments before using them in 
 ContainerManagerImpl.startContainer
 -

 Key: YARN-644
 URL: https://issues.apache.org/jira/browse/YARN-644
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Priority: Minor

 I see that validation/ null check is not performed on passed in parameters. 
 Ex. tokenId.getContainerID().getApplicationAttemptId() inside 
 ContainerManagerImpl.authorizeRequest()
 I guess we should add these checks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-299) Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE


 [ 
https://issues.apache.org/jira/browse/YARN-299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-299:
---

Attachment: YARN-299-trunk-2.patch

Updating the patch.

Thanks,
Mayank

 Node Manager throws 
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 RESOURCE_FAILED at DONE
 ---

 Key: YARN-299
 URL: https://issues.apache.org/jira/browse/YARN-299
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.0.1-alpha, 2.0.0-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-299-trunk-1.patch, YARN-299-trunk-2.patch


 {code:xml}
 2012-12-31 10:36:27,844 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Can't handle this event at current state: Current: [DONE], eventType: 
 [RESOURCE_FAILED]
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 RESOURCE_FAILED at DONE
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:819)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:71)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:504)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:497)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 2012-12-31 10:36:27,845 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1356792558130_0002_01_01 transitioned from DONE to 
 null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-299) Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE


[ 
https://issues.apache.org/jira/browse/YARN-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702918#comment-13702918
 ] 

Hadoop QA commented on YARN-299:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12591378/YARN-299-trunk-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1437//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1437//console

This message is automatically generated.

 Node Manager throws 
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 RESOURCE_FAILED at DONE
 ---

 Key: YARN-299
 URL: https://issues.apache.org/jira/browse/YARN-299
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.0.1-alpha, 2.0.0-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-299-trunk-1.patch, YARN-299-trunk-2.patch


 {code:xml}
 2012-12-31 10:36:27,844 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Can't handle this event at current state: Current: [DONE], eventType: 
 [RESOURCE_FAILED]
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 RESOURCE_FAILED at DONE
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:819)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:71)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:504)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:497)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 2012-12-31 10:36:27,845 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1356792558130_0002_01_01 transitioned from DONE to 
 null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception

[
https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702933#comment-13702933
]

Xuan Gong commented on YARN-875:

For the callback, we can catch Throwable, and call handler.onError(Exception).
This will tell ApplicationMaster to jump out of loop, and go into finish
function. And eventually, AMRMClientAsync will call unregisterApplicationMaster
and set keepRunning flag to false which will stop the heartBeat thread.

But we can let HeartBeat thread stop a little bit earlier.
Option one : inside the catch block, we can call heartBeatThread.interrupt()
and set keepRunning = false
Option two : we define a volatile Exception savedCallBackException, inside the
catch block, we can set savedCallBackException, and inside
heartBeatThread.run(), before we do the allocate(), we alway check whether
savedCallBackException is null.

[~bikassaha] anu other suggestions ?

Application can hang if AMRMClientAsync callback thread has exception
-

Key: YARN-875
URL: https://issues.apache.org/jira/browse/YARN-875
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong

Currently that thread will die and then never callback. App can hang.
Possible solution could be to catch Throwable in the callback and then call
client.onError().

[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report


[ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702938#comment-13702938
 ] 

Xuan Gong commented on YARN-873:


[~bikassaha] any comments

 YARNClient.getApplicationReport(unknownAppId) returns a null report
 ---

 Key: YARN-873
 URL: https://issues.apache.org/jira/browse/YARN-873
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong

 How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-643) WHY appToken is removed both in BaseFinalTransition and AMUnregisteredTransition AND clientToken is removed in FinalTransition and not BaseFinalTransition


[ 
https://issues.apache.org/jira/browse/YARN-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702940#comment-13702940
 ] 

Xuan Gong commented on YARN-643:


[~vinodkv] any comments ?

 WHY appToken is removed both in BaseFinalTransition and 
 AMUnregisteredTransition AND clientToken is removed in FinalTransition and 
 not BaseFinalTransition
 --

 Key: YARN-643
 URL: https://issues.apache.org/jira/browse/YARN-643
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Xuan Gong



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-296) Resource Manager throws InvalidStateTransitonException: Invalid event: APP_ACCEPTED at RUNNING for RMAppImpl