[jira] [Commented] (YARN-902) Used Resources field in Resourcemanager scheduler UI not displaying any values
[ https://issues.apache.org/jira/browse/YARN-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13701988#comment-13701988 ] Thomas Graves commented on YARN-902: Are you using the latest branch-2 or the released 2.0.5-alpha? This might be a duplicate of YARN-764. Used Resources field in Resourcemanager scheduler UI not displaying any values Key: YARN-902 URL: https://issues.apache.org/jira/browse/YARN-902 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.5-alpha Reporter: Nishan Shetty Priority: Minor Used Resources field in Resourcemanager scheduler UI not displaying any values -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-347) YARN node CLI should also show CPU info as memory info in node status
[ https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-347: - Issue Type: Improvement (was: Bug) YARN node CLI should also show CPU info as memory info in node status - Key: YARN-347 URL: https://issues.apache.org/jira/browse/YARN-347 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Junping Du Assignee: Junping Du Attachments: YARN-347.patch With YARN-2 checked in, CPU info are taken into consideration in resource scheduling. yarn node -status NodeID should show CPU used and capacity info as memory info. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-644) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer
[ https://issues.apache.org/jira/browse/YARN-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702158#comment-13702158 ] Omkar Vinit Joshi commented on YARN-644: [~josephkniest] in these scenarios we should simply reject client requests. If the client has fabricated the container token or NMToken (YARN-613) then these scenarios (NPE) are quite possible. Ideally we should reject them as invalid tokens. Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer - Key: YARN-644 URL: https://issues.apache.org/jira/browse/YARN-644 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Priority: Minor I see that validation/ null check is not performed on passed in parameters. Ex. tokenId.getContainerID().getApplicationAttemptId() inside ContainerManagerImpl.authorizeRequest() I guess we should add these checks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt
[ https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-292: - Issue Type: Sub-task (was: Bug) Parent: YARN-676 ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt Key: YARN-292 URL: https://issues.apache.org/jira/browse/YARN-292 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.1-alpha Reporter: Devaraj K {code:xml} 2012-12-26 08:41:15,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Calling allocate on removed or non existant application appattempt_1356385141279_49525_01 2012-12-26 08:41:15,031 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type CONTAINER_ALLOCATED for applicationAttempt application_1356385141279_49525 java.lang.ArrayIndexOutOfBoundsException: 0 at java.util.Arrays$ArrayList.get(Arrays.java:3381) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702177#comment-13702177 ] Zhijie Shen commented on YARN-502: -- +1, the patch looks good to me RM crash with NPE on NODE_REMOVED event with FairScheduler -- Key: YARN-502 URL: https://issues.apache.org/jira/browse/YARN-502 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.3-alpha Reporter: Lohit Vijayarenu Assignee: Mayank Bansal Attachments: YARN-502-trunk-1.patch, YARN-502-trunk-2.patch While running some test and adding/removing nodes, we see RM crashed with the below exception. We are testing with fair scheduler and running hadoop-2.0.3-alpha {noformat} 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node :55680 as it is now LOST 2013-03-22 18:54:27,015 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: :55680 Node Transitioned from UNHEALTHY to LOST 2013-03-22 18:54:27,015 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_REMOVED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeNode(FairScheduler.java:619) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:856) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:375) at java.lang.Thread.run(Thread.java:662) 2013-03-22 18:54:27,016 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. 2013-03-22 18:54:27,020 INFO org.mortbay.log: Stopped SelectChannelConnector@:50030 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-623) NodeManagers on RM web-app don't have diagnostic information
[ https://issues.apache.org/jira/browse/YARN-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-623: - Issue Type: Improvement (was: Bug) NodeManagers on RM web-app don't have diagnostic information Key: YARN-623 URL: https://issues.apache.org/jira/browse/YARN-623 Project: Hadoop YARN Issue Type: Improvement Reporter: Vinod Kumar Vavilapalli Assignee: Mayank Bansal Labels: usability If RM for some reason asks NMs to shut-down or reboot, it will be very useful to show that information on the UI so that operators can know directly instead of login into machines and looking for logs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-397) RM Scheduler api enhancements
[ https://issues.apache.org/jira/browse/YARN-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-397: - Issue Type: Improvement (was: Bug) RM Scheduler api enhancements - Key: YARN-397 URL: https://issues.apache.org/jira/browse/YARN-397 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Umbrella jira tracking enhancements to RM apis. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-894) NodeHealthScriptRunner timeout checking is inaccurate on Windows
[ https://issues.apache.org/jira/browse/YARN-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702237#comment-13702237 ] Chris Nauroth commented on YARN-894: Hi, Chuan. This patch looks good, but I'm seeing a failure in the test on my Windows machine. If I run just {{TestNodeHealthService#testNodeHealthScript}}, then it passes. If I run the whole {{TestNodeHealthService}} suite, then that same test fails with: {code} testNodeHealthScript(org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService) Time elapsed: 187 sec ERROR! java.io.FileNotFoundException: C:\hdc\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService-localDir\failingscript.cmd (The process cannot ac cess the file because it is being used by another process) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:194) at java.io.FileOutputStream.init(FileOutputStream.java:145) at org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService.writeNodeHealthScriptFile(TestNodeHealthService.java:82) at org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService.testNodeHealthScript(TestNodeHealthService.java:154) {code} Do you see this happen too? It's probably a file leak out of the prior test. NodeHealthScriptRunner timeout checking is inaccurate on Windows Key: YARN-894 URL: https://issues.apache.org/jira/browse/YARN-894 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.1.0-beta Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Attachments: ReadProcessStdout.java, wait.cmd, wait.sh, YARN-894-trunk.patch In {{NodeHealthScriptRunner}} method, we will set HealthChecker status based on the Shell execution results. Some status are based on the exception thrown during the Shell script execution. Currently, we will catch a non-ExitCodeException from ShellCommandExecutor, and if Shell has the timeout status set at the same time, we will also set HealthChecker status to timeout. We have following execution sequence in Shell: 1) In main thread, schedule a delayed timer task that will kill the original process upon timeout. 2) In main thread, open a buffered reader and feed in the process's standard input stream. 3) When timeout happens, the timer task will call {{Process#destroy()}} to kill the main process. On Linux, when timeout happened and process killed, the buffered reader will thrown an IOException with message: Stream closed in main thread. On Windows, we don't have the IOException. Only -1 was returned from the reader that indicates the buffer is finished. As a result, the timeout status is not set on Windows, and {{TestNodeHealthService}} fails on Windows because of this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-865) RM webservices can't query on application Types
[ https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-865: - Issue Type: Improvement (was: Bug) RM webservices can't query on application Types --- Key: YARN-865 URL: https://issues.apache.org/jira/browse/YARN-865 Project: Hadoop YARN Issue Type: Improvement Reporter: Xuan Gong Assignee: Xuan Gong Attachments: MR-5337.1.patch The resource manager web service api to get the list of apps doesn't have a query parameter for appTypes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-843) TestPipeApplication should not be using AMRMToken.
[ https://issues.apache.org/jira/browse/YARN-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-843: - Issue Type: Improvement (was: Bug) TestPipeApplication should not be using AMRMToken. -- Key: YARN-843 URL: https://issues.apache.org/jira/browse/YARN-843 Project: Hadoop YARN Issue Type: Improvement Reporter: Omkar Vinit Joshi [YARN-822 comment | https://issues.apache.org/jira/browse/YARN-822?focusedCommentId=13685802page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13685802] May be we can just remove the token usage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting
[ https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-807: - Issue Type: Improvement (was: Bug) When querying apps by queue, iterating over all apps is inefficient and limiting - Key: YARN-807 URL: https://issues.apache.org/jira/browse/YARN-807 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza The question which apps are in queue x can be asked via the RM REST APIs, through the ClientRMService, and through the command line. In all these cases, the question is answered by scanning through every RMApp and filtering by the app's queue name. All schedulers maintain a mapping of queues to applications. I think it would make more sense to ask the schedulers which applications are in a given queue. This is what was done in MR1. This would also have the advantage of allowing a parent queue to return all the applications on leaf queues under it, and allow queue name aliases, as in the way that root.default and default refer to the same queue in the fair scheduler. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-115) yarn commands shouldn't add m to the heapsize
[ https://issues.apache.org/jira/browse/YARN-115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-115: - Issue Type: Improvement (was: Bug) yarn commands shouldn't add m to the heapsize --- Key: YARN-115 URL: https://issues.apache.org/jira/browse/YARN-115 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 0.23.3 Reporter: Thomas Graves Labels: usability the yarn commands add m to the heapsize. This is unlike the hdfs side and the the old jt/tt used to do. JAVA_HEAP_MAX=-Xmx$YARN_RESOURCEMANAGER_HEAPSIZEm JAVA_HEAP_MAX=-Xmx$YARN_NODEMANAGER_HEAPSIZEm We should not be adding in the m and allow the user to specify units. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-148) CapacityScheduler shouldn't explicitly need YarnConfiguration
[ https://issues.apache.org/jira/browse/YARN-148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-148: - Issue Type: Improvement (was: Bug) CapacityScheduler shouldn't explicitly need YarnConfiguration - Key: YARN-148 URL: https://issues.apache.org/jira/browse/YARN-148 Project: Hadoop YARN Issue Type: Improvement Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli This was done in MAPREDUCE-3773. None of our service APIs warrant YarnConfiguration. We affect the proper loading of yarn-site.xml by explicitly creating YarnConfiguration in all the main classes - ResourceManager, NodeManager etc. Due to this extra dependency, tests are failing, see https://builds.apache.org/job/PreCommit-YARN-Build/74//testReport/org.apache.hadoop.yarn.client/TestYarnClient/testClientStop/. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-295) Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl
[ https://issues.apache.org/jira/browse/YARN-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702245#comment-13702245 ] Zhijie Shen commented on YARN-295: -- I agree with moving RMAppAttempt from ALLOCATED to FAILED through AMContainerCrashedTransition. WRT the test, is the following not necessary? {code} +launchApplicationAttempt(amContainer); +runApplicationAttempt(amContainer, host, 8042, oldtrackingurl); {code} See testAllocatedToFailed. Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl --- Key: YARN-295 URL: https://issues.apache.org/jira/browse/YARN-295 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.2-alpha, 2.0.1-alpha Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-295-trunk-1.patch, YARN-295-trunk-2.patch {code:xml} 2012-12-28 14:03:56,956 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-8) Add more unit tests for CPU scheduling in CS
[ https://issues.apache.org/jira/browse/YARN-8?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-8: --- Issue Type: Improvement (was: Bug) Add more unit tests for CPU scheduling in CS Key: YARN-8 URL: https://issues.apache.org/jira/browse/YARN-8 Project: Hadoop YARN Issue Type: Improvement Reporter: Arun C Murthy Assignee: Arun C Murthy Companion to YARN-2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-601) Refactoring the code which computes the user file cache and user application file cache paths
[ https://issues.apache.org/jira/browse/YARN-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-601: - Issue Type: Improvement (was: Bug) Refactoring the code which computes the user file cache and user application file cache paths - Key: YARN-601 URL: https://issues.apache.org/jira/browse/YARN-601 Project: Hadoop YARN Issue Type: Improvement Reporter: Omkar Vinit Joshi Priority: Minor At present at multiple places user local cache file path and user application file path are getting calculated. It is better to expose them as a single static utility method and reuse them everywhere else. Locations : * ContainerLaunch * DefaultContainerExecutor: this already has some methods like this * ResourceLocalizationService: getUserCacheFilePath getAppFileCachePath * ContainerLocalizer * ShuffleHandler.Shuffle * TestContainerLocalizer, TestContainerManager, TestDefaultContainerExecutor and TestResourceLocalizationService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-545) NodeResourceMonitor and its Impl are emty and may be removed
[ https://issues.apache.org/jira/browse/YARN-545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-545: - Issue Type: Improvement (was: Bug) NodeResourceMonitor and its Impl are emty and may be removed Key: YARN-545 URL: https://issues.apache.org/jira/browse/YARN-545 Project: Hadoop YARN Issue Type: Improvement Reporter: Bikas Saha Priority: Minor -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-100) container-executor should deal with stdout, stderr better
[ https://issues.apache.org/jira/browse/YARN-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-100: - Issue Type: Improvement (was: Bug) container-executor should deal with stdout, stderr better - Key: YARN-100 URL: https://issues.apache.org/jira/browse/YARN-100 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.0.1-alpha Reporter: Colin Patrick McCabe Priority: Minor container-executor.c contains the following code: {code} fclose(stdin); fflush(LOGFILE); if (LOGFILE != stdout) { fclose(stdout); } if (ERRORFILE != stderr) { fclose(stderr); } if (chdir(primary_app_dir) != 0) { fprintf(LOGFILE, Failed to chdir to app dir - %s\n, strerror(errno)); return -1; } execvp(args[0], args); {code} Whenever you open a new file descriptor, its number is the lowest available number. So if {{stdout}} (fd number 1) has been closed, and you do open(/my/important/file), you'll get assigned file descriptor 1. This means that any printf statements in the program will be now printing to /my/important/file. Oops! The correct way to get rid of stdin, stdout, or stderr is not to close them, but to make them point to /dev/null. {{dup2}} can be used for this purpose. It looks like LOGFILE and ERRORFILE are always set to stdout and stderr at the moment. However, this is a latent bug that should be fixed in case these are ever made configurable (which seems to have been the intent). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-52) Seed TestYarnClient with tests
[ https://issues.apache.org/jira/browse/YARN-52?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-52. - Resolution: Duplicate Some tests are already added via some recent tickets. Closing this. Seed TestYarnClient with tests -- Key: YARN-52 URL: https://issues.apache.org/jira/browse/YARN-52 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli YARN-29 added TestYarnClient with no tests. We need to add client-specific tests validating the client contracts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-766) TestNodeManagerShutdown should use Shell to form the output path
[ https://issues.apache.org/jira/browse/YARN-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-766: - Issue Type: Improvement (was: Bug) TestNodeManagerShutdown should use Shell to form the output path Key: YARN-766 URL: https://issues.apache.org/jira/browse/YARN-766 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.1.0-beta Reporter: Siddharth Seth Assignee: Siddharth Seth Priority: Minor Attachments: YARN-766.branch-2.txt, YARN-766.trunk.txt, YARN-766.txt File scriptFile = new File(tmpDir, scriptFile.sh); should be replaced with File scriptFile = Shell.appendScriptExtension(tmpDir, scriptFile); to match trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-245) Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED
[ https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702308#comment-13702308 ] Omkar Vinit Joshi commented on YARN-245: I think this will not fix the root cause. Looking at the current transitions it seems that ApplicationImpl got 2 events (APPLICATION_FINISH) when it only expects one in its life cycle. The first event made the successful transition but second event which in this case occurred at FINISHED state create invalid transition. Looking at the code it looks like below code sent two events in consecutive loop cycles (node heartbeats)..which caused the problem.. [~devaraj.k] is there any way we can reproduce this? did you see that error again? NodeStatusUpdaterImpl.run {code} if (appsToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedAppsEvent(appsToCleanup)); } {code} [~mayank_bansal] I think we need to fix nodeStatusUpdaterImpl.run code. At present it doesn't check if nm received 2 identical responses i.e. NM sent heartbeat but didn't get response from rm so it sent the heartbeat again. In turn RM sent 2 identical responses. The side effect of this is that NM for first response already sent the application finished event... which will create problem if it tries to send it again on next identical heartbeat. {code} lastHeartBeatID = response.getResponseId(); ListContainerId containersToCleanup = response .getContainersToCleanup(); if (containersToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedContainersEvent(containersToCleanup, CMgrCompletedContainersEvent.Reason.BY_RESOURCEMANAGER)); } ListApplicationId appsToCleanup = response.getApplicationsToCleanup(); //Only start tracking for keepAlive on FINISH_APP trackAppsForKeepAlive(appsToCleanup); if (appsToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedAppsEvent(appsToCleanup)); } {code} I think we can reproduce this if we send same heartbeat response again which includes application finish event. any thoughts? Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED Key: YARN-245 URL: https://issues.apache.org/jira/browse/YARN-245 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.2-alpha, 2.0.1-alpha Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-245-trunk-1.patch {code:xml} 2012-11-25 12:56:11,795 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) 2012-11-25 12:56:11,796 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1353818859056_0004 transitioned from FINISHED to null {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-149) ResourceManager (RM) High-Availability (HA)
[ https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702365#comment-13702365 ] Alejandro Abdelnur commented on YARN-149: - Karthik, high level seems OK to me. One thing I would add, for the HTTP failover, if we have the RMHA wrapper approach the wrapper -when in standby- would redirect HTTP calls to the active RM. While this does not cover the case of rerouting if hitting an RM that crashed, it will cover the common case where somebody hits the running standby. ResourceManager (RM) High-Availability (HA) --- Key: YARN-149 URL: https://issues.apache.org/jira/browse/YARN-149 Project: Hadoop YARN Issue Type: New Feature Reporter: Harsh J Assignee: Bikas Saha Attachments: rm-ha-phase1-approach-draft1.pdf This jira tracks work needed to be done to support one RM instance failing over to another RM instance so that we can have RM HA. Work includes leader election, transfer of control to leader and client re-direction to new leader. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-149) ResourceManager (RM) High-Availability (HA)
[ https://issues.apache.org/jira/browse/YARN-149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702378#comment-13702378 ] Bikas Saha commented on YARN-149: - Thanks for the notes Karthik. I will go through it and incorporate stuff into the final document. ResourceManager (RM) High-Availability (HA) --- Key: YARN-149 URL: https://issues.apache.org/jira/browse/YARN-149 Project: Hadoop YARN Issue Type: New Feature Reporter: Harsh J Assignee: Bikas Saha Attachments: rm-ha-phase1-approach-draft1.pdf This jira tracks work needed to be done to support one RM instance failing over to another RM instance so that we can have RM HA. Work includes leader election, transfer of control to leader and client re-direction to new leader. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-894) NodeHealthScriptRunner timeout checking is inaccurate on Windows
[ https://issues.apache.org/jira/browse/YARN-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702402#comment-13702402 ] Chuan Liu commented on YARN-894: I only saw this once. After I did a clean and rebuild, I could not repro this error in subsequent runs. It could be a timing issue. I think the test case had written to the same script when it failed at {{TestNodeHealthService.java:154}}. NodeHealthScriptRunner timeout checking is inaccurate on Windows Key: YARN-894 URL: https://issues.apache.org/jira/browse/YARN-894 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.1.0-beta Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Attachments: ReadProcessStdout.java, wait.cmd, wait.sh, YARN-894-trunk.patch In {{NodeHealthScriptRunner}} method, we will set HealthChecker status based on the Shell execution results. Some status are based on the exception thrown during the Shell script execution. Currently, we will catch a non-ExitCodeException from ShellCommandExecutor, and if Shell has the timeout status set at the same time, we will also set HealthChecker status to timeout. We have following execution sequence in Shell: 1) In main thread, schedule a delayed timer task that will kill the original process upon timeout. 2) In main thread, open a buffered reader and feed in the process's standard input stream. 3) When timeout happens, the timer task will call {{Process#destroy()}} to kill the main process. On Linux, when timeout happened and process killed, the buffered reader will thrown an IOException with message: Stream closed in main thread. On Windows, we don't have the IOException. Only -1 was returned from the reader that indicates the buffer is finished. As a result, the timeout status is not set on Windows, and {{TestNodeHealthService}} fails on Windows because of this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations
[ https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-521: Attachment: YARN-521-1.patch Augment AM - RM client module to be able to request containers only at specific locations - Key: YARN-521 URL: https://issues.apache.org/jira/browse/YARN-521 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-521-1.patch, YARN-521.patch When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to offer an easy way to access their functionality -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations
[ https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702424#comment-13702424 ] Sandy Ryza commented on YARN-521: - I uploaded a new patch that adds in the checks mentioned by Bikas and Alejandro. I moved the parameter documentation to constructors. Regarding the priority comments, I think the new ones are clearer, but they're out of the scope this JIRA so I reverted them and I'll argue somewhere else if I want to make that change. Regarding renaming allRacks to dedupedRacks, the code is doing the same thing as it was before for requests with locality relaxed, but I had to move some things around to accommodate disabling locality relaxation. Augment AM - RM client module to be able to request containers only at specific locations - Key: YARN-521 URL: https://issues.apache.org/jira/browse/YARN-521 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-521-1.patch, YARN-521.patch When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to offer an easy way to access their functionality -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-791) Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API
[ https://issues.apache.org/jira/browse/YARN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-791: Attachment: YARN-791-8.patch Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API - Key: YARN-791 URL: https://issues.apache.org/jira/browse/YARN-791 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Blocker Attachments: YARN-791-1.patch, YARN-791-2.patch, YARN-791-3.patch, YARN-791-4.patch, YARN-791-5.patch, YARN-791-6.patch, YARN-791-7.patch, YARN-791-8.patch, YARN-791.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-791) Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API
[ https://issues.apache.org/jira/browse/YARN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702427#comment-13702427 ] Sandy Ryza commented on YARN-791: - Uploaded a patch that updates the CLI documentation. Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API - Key: YARN-791 URL: https://issues.apache.org/jira/browse/YARN-791 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Blocker Attachments: YARN-791-1.patch, YARN-791-2.patch, YARN-791-3.patch, YARN-791-4.patch, YARN-791-5.patch, YARN-791-6.patch, YARN-791-7.patch, YARN-791-8.patch, YARN-791.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations
[ https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702432#comment-13702432 ] Hadoop QA commented on YARN-521: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591273/YARN-521-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1427//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1427//console This message is automatically generated. Augment AM - RM client module to be able to request containers only at specific locations - Key: YARN-521 URL: https://issues.apache.org/jira/browse/YARN-521 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-521-1.patch, YARN-521.patch When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to offer an easy way to access their functionality -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702443#comment-13702443 ] Jian He commented on YARN-513: -- bq.Does ResourceTrackerClientPBImpl still need a close method after this code? If yes, then why does this code not call close() instead? These two methods are separate as one for ResourceTrackerPB.class,the other for ResourceTracker.class. we can also see this from YarnClientImpl.serviceStop and ApplicationClientProtocolPBClientImpl.close(). Did not investigate into the deep implementations. bq. Does this need an @VisibleForTesting flag? This is actually only overridden in test class not directly called in test case. bq. Why has this been removed in testConnectionNMToRM? Was this redundant earlier? Does some new test code cover this check? I don't see the need of testing this. The following assertion Assert.assertTrue(NM started before updater triggered,myUpdater.isTriggered()); covers this I think. bq. Some of the tests probably dont need to wrap the fake ResourceTracker inside a RetryProxy. Only TestNodeStatusUpdaterRetryAndNMShutdown and testNMConnectionToRM are using the ResourceTracker inside a RetryProxy, which I think should need. Agreed with other comments. Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, YARN-513.13.patch, YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-513) Create common proxy client for communicating with RM
[ https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702452#comment-13702452 ] Jian He commented on YARN-513: -- bq. Does this need an @VisibleForTesting flag? Yes, we need to mark as @VisibleForTesting. (disregard my earlier comment) Create common proxy client for communicating with RM Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Jian He Attachments: YARN-513.10.patch, YARN-513.11.patch, YARN-513.12.patch, YARN-513.13.patch, YARN-513.1.patch, YARN-513.2.patch, YARN-513.3.patch, YARN-513.4.patch, YARN.513.5.patch, YARN-513.6.patch, YARN-513.7.patch, YARN-513.8.patch, YARN-513.9.patch When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-299) Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE
[ https://issues.apache.org/jira/browse/YARN-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702456#comment-13702456 ] Omkar Vinit Joshi commented on YARN-299: I guess the patch looks good overall .. however we need an additional fix which might also occur. The root cause for this is more evident in YARN-820 logs.. Container is requesting multiple resources and RESOURCE_LOCALIZED / RESOURCE_FAILED events might occur for one more more resources between container received first RESOURCE_FAILED event and it deregister itself from remaining resources...therefore we might see RESOURCE_FAILED / RESOURCE_LOCALIZED events sent to containerImpl when resource is in DONE state (for different resources) Therefore like RESOURCE_FAILED we should also ignore RESOURCE_LOCALIZED event. I could see one more issue in the logs... it would be great if we fix that too as a part of this jira looks like a quick change... here in LOG.info it is calling toString on LocalizedResource which is not threadsafe for ref (LinkedList used internally). I guess grabbing writelock inside toString will protect it from such exceptions.. we need to check other state machines as well. {code} } catch (ExecutionException e) { LOG.info(Failed to download rsrc + assoc.getResource(), e.getCause()); LocalResourceRequest req = assoc.getResource().getRequest(); publicRsrc.handle(new ResourceFailedLocalizationEvent(req, e.getMessage())); assoc.getResource().unlock(); {code} any thoughts? Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE --- Key: YARN-299 URL: https://issues.apache.org/jira/browse/YARN-299 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.0.1-alpha, 2.0.0-alpha Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-299-trunk-1.patch {code:xml} 2012-12-31 10:36:27,844 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Can't handle this event at current state: Current: [DONE], eventType: [RESOURCE_FAILED] org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:819) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:71) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:504) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:497) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) 2012-12-31 10:36:27,845 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1356792558130_0002_01_01 transitioned from DONE to null {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-296) Resource Manager throws InvalidStateTransitonException: Invalid event: APP_ACCEPTED at RUNNING for RMAppImpl
[ https://issues.apache.org/jira/browse/YARN-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702464#comment-13702464 ] Zhijie Shen commented on YARN-296: -- The patch should work, but IMHO, the essential problem is that APP_ACCEPTED is not expected at RUNNING. APP_ACCEPTED is created during ScheduleTransition of a RMAppAttempt, and is consumed when a RMApp moves from SUBMITTED to ACCEPTED. Only after the RMApp enters ACCEPTED, it can further move to RUNNING (similar for UnmanagedAM). Therefore, APP_ACCEPTED shouldn't be seen when the RMApp is at RUNNING. Moreover, it seems impossible that APP_ACCEPTED belongs to the last RMAppAttempt if the RMApp is retrying, as retry can only happen after the RMApp enters ACCEPTED, where APP_ACCEPTED produced by the last RMAppAttempt has already be consumed. [~devaraj], would you mind post more context around the InvalidStateTransitonException, such that we can dig more about the problem? Resource Manager throws InvalidStateTransitonException: Invalid event: APP_ACCEPTED at RUNNING for RMAppImpl Key: YARN-296 URL: https://issues.apache.org/jira/browse/YARN-296 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.2-alpha, 2.0.1-alpha Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-296-trunk-1.patch, YARN-296-trunk-2.patch {code:xml} 2012-12-28 11:14:47,671 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APP_ACCEPTED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:528) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:72) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:405) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:389) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-727) ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter
[ https://issues.apache.org/jira/browse/YARN-727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-727: --- Attachment: YARN-727.18.patch ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter Key: YARN-727 URL: https://issues.apache.org/jira/browse/YARN-727 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Siddharth Seth Assignee: Xuan Gong Priority: Blocker Attachments: YARN-727.10.patch, YARN-727.11.patch, YARN-727.12.patch, YARN-727.13.patch, YARN-727.14.patch, YARN-727.15.patch, YARN-727.16.patch, YARN-727.17.patch, YARN-727.18.patch, YARN-727.1.patch, YARN-727.2.patch, YARN-727.3.patch, YARN-727.4.patch, YARN-727.5.patch, YARN-727.6.patch, YARN-727.7.patch, YARN-727.8.patch, YARN-727.9.patch Now that an ApplicationType is registered on ApplicationSubmission, getAllApplications should be able to use this string to query for a specific application type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-727) ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter
[ https://issues.apache.org/jira/browse/YARN-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702511#comment-13702511 ] Xuan Gong commented on YARN-727: Thanks for the comments. Created a new patch to address the comments ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter Key: YARN-727 URL: https://issues.apache.org/jira/browse/YARN-727 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Siddharth Seth Assignee: Xuan Gong Priority: Blocker Attachments: YARN-727.10.patch, YARN-727.11.patch, YARN-727.12.patch, YARN-727.13.patch, YARN-727.14.patch, YARN-727.15.patch, YARN-727.16.patch, YARN-727.17.patch, YARN-727.18.patch, YARN-727.1.patch, YARN-727.2.patch, YARN-727.3.patch, YARN-727.4.patch, YARN-727.5.patch, YARN-727.6.patch, YARN-727.7.patch, YARN-727.8.patch, YARN-727.9.patch Now that an ApplicationType is registered on ApplicationSubmission, getAllApplications should be able to use this string to query for a specific application type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-369) Handle ( or throw a proper error when receiving) status updates from application masters that have not registered
[ https://issues.apache.org/jira/browse/YARN-369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-369: --- Attachment: YARN-369-trunk-4.patch Thanks [~bikassaha] for the review. Incorporated all the comments. Attaching the latest patch. Thanks, Mayank Handle ( or throw a proper error when receiving) status updates from application masters that have not registered - Key: YARN-369 URL: https://issues.apache.org/jira/browse/YARN-369 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.3-alpha, trunk-win Reporter: Hitesh Shah Assignee: Mayank Bansal Attachments: YARN-369.patch, YARN-369-trunk-1.patch, YARN-369-trunk-2.patch, YARN-369-trunk-3.patch, YARN-369-trunk-4.patch Currently, an allocate call from an unregistered application is allowed and the status update for it throws a statemachine error that is silently dropped. org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:588) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:471) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:452) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:680) ApplicationMasterService should likely throw an appropriate error for applications' requests that should not be handled in such cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-727) ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter
[ https://issues.apache.org/jira/browse/YARN-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702514#comment-13702514 ] Xuan Gong commented on YARN-727: Related MR changes are in https://issues.apache.org/jira/browse/MAPREDUCE-5325 Most of MR changes are for the api rename ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter Key: YARN-727 URL: https://issues.apache.org/jira/browse/YARN-727 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Siddharth Seth Assignee: Xuan Gong Priority: Blocker Attachments: YARN-727.10.patch, YARN-727.11.patch, YARN-727.12.patch, YARN-727.13.patch, YARN-727.14.patch, YARN-727.15.patch, YARN-727.16.patch, YARN-727.17.patch, YARN-727.18.patch, YARN-727.1.patch, YARN-727.2.patch, YARN-727.3.patch, YARN-727.4.patch, YARN-727.5.patch, YARN-727.6.patch, YARN-727.7.patch, YARN-727.8.patch, YARN-727.9.patch Now that an ApplicationType is registered on ApplicationSubmission, getAllApplications should be able to use this string to query for a specific application type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-295) Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl
[ https://issues.apache.org/jira/browse/YARN-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-295: --- Attachment: YARN-295-trunk-3.patch Thanks [~zjshen] for review. Updated the patch. Thanks, Mayank Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl --- Key: YARN-295 URL: https://issues.apache.org/jira/browse/YARN-295 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.2-alpha, 2.0.1-alpha Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-295-trunk-1.patch, YARN-295-trunk-2.patch, YARN-295-trunk-3.patch {code:xml} 2012-12-28 14:03:56,956 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-791) Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API
[ https://issues.apache.org/jira/browse/YARN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702529#comment-13702529 ] Hadoop QA commented on YARN-791: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591279/YARN-791-8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1428//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1428//console This message is automatically generated. Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API - Key: YARN-791 URL: https://issues.apache.org/jira/browse/YARN-791 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Blocker Attachments: YARN-791-1.patch, YARN-791-2.patch, YARN-791-3.patch, YARN-791-4.patch, YARN-791-5.patch, YARN-791-6.patch, YARN-791-7.patch, YARN-791-8.patch, YARN-791.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable
[ https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702532#comment-13702532 ] Omkar Vinit Joshi commented on YARN-814: * Instead of putting string messages of where this log came from we can use log4j L option if suitable/required. [http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/PatternLayout.html] similarly other LOG.warn places. {code} - LOG.warn(Exit code from container is : + exitCode); + LOG.warn(Exit code from LinuxContainerExecutor's deleteAsUser is : + exitCode); {code} * I guess it will be more helpful if we add containerId there.. (locId = containerId) rest of the patch looks good. Difficult to diagnose a failed container launch when error due to invalid environment variable -- Key: YARN-814 URL: https://issues.apache.org/jira/browse/YARN-814 Project: Hadoop YARN Issue Type: Sub-task Reporter: Hitesh Shah Assignee: Jian He Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, YARN-814.4.patch, YARN-814.patch The container's launch script sets up environment variables, symlinks etc. If there is any failure when setting up the basic context ( before the actual user's process is launched ), nothing is captured by the NM. This makes it impossible to diagnose the reason for the failure. To reproduce, set an env var where the value contains characters that throw syntax errors in bash. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-791) Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API
[ https://issues.apache.org/jira/browse/YARN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702537#comment-13702537 ] Vinod Kumar Vavilapalli commented on YARN-791: -- The latest patch looks good to me. Checking it in. Regarding the CLI stuff, can you look at the latest patch at YARN-727? We will need to add something like --node-state similar to --app-type there. Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API - Key: YARN-791 URL: https://issues.apache.org/jira/browse/YARN-791 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Blocker Attachments: YARN-791-1.patch, YARN-791-2.patch, YARN-791-3.patch, YARN-791-4.patch, YARN-791-5.patch, YARN-791-6.patch, YARN-791-7.patch, YARN-791-8.patch, YARN-791.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-369) Handle ( or throw a proper error when receiving) status updates from application masters that have not registered
[ https://issues.apache.org/jira/browse/YARN-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702540#comment-13702540 ] Hadoop QA commented on YARN-369: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591295/YARN-369-trunk-4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1429//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1429//console This message is automatically generated. Handle ( or throw a proper error when receiving) status updates from application masters that have not registered - Key: YARN-369 URL: https://issues.apache.org/jira/browse/YARN-369 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.3-alpha, trunk-win Reporter: Hitesh Shah Assignee: Mayank Bansal Attachments: YARN-369.patch, YARN-369-trunk-1.patch, YARN-369-trunk-2.patch, YARN-369-trunk-3.patch, YARN-369-trunk-4.patch Currently, an allocate call from an unregistered application is allowed and the status update for it throws a statemachine error that is silently dropped. org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:588) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:471) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:452) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:680) ApplicationMasterService should likely throw an appropriate error for applications' requests that should not be handled in such cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-727) ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter
[ https://issues.apache.org/jira/browse/YARN-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702543#comment-13702543 ] Hitesh Shah commented on YARN-727: -- - Use MRJobConfig.MR_APPLICATION_TYPE instead of MAPREDUCE. - documentation for ApplicationClientProtocol#getApplications does not seem to be correct. It does not mention usage in terms of filtering based on criteria defined in the request object. It mentions that it returns only running applications - is that correct? - Likewise in YarnClient. Doc changes. {code} + * Get a report (ApplicationReport) of Applications + * about the given applicationTypes in the cluster. {code} - Above should be reworded to something along the lines of getting reports of applications matching the given application types. - In YarnClientImpl, should code be re-used across the 2 getApplication* functions? - why does APP_TYPE_CMD need to be in YarnCLI and not in ApplicationCLI? - param appTypes - please add more docs for app types in various places where this is an argument - In GetApplicationsRequestPBImpl, what happens if setApplicationTypes is called twice. The first with a non-empty set and the second call with a null. ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter Key: YARN-727 URL: https://issues.apache.org/jira/browse/YARN-727 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Siddharth Seth Assignee: Xuan Gong Priority: Blocker Attachments: YARN-727.10.patch, YARN-727.11.patch, YARN-727.12.patch, YARN-727.13.patch, YARN-727.14.patch, YARN-727.15.patch, YARN-727.16.patch, YARN-727.17.patch, YARN-727.18.patch, YARN-727.1.patch, YARN-727.2.patch, YARN-727.3.patch, YARN-727.4.patch, YARN-727.5.patch, YARN-727.6.patch, YARN-727.7.patch, YARN-727.8.patch, YARN-727.9.patch Now that an ApplicationType is registered on ApplicationSubmission, getAllApplications should be able to use this string to query for a specific application type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-791) Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API
[ https://issues.apache.org/jira/browse/YARN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702547#comment-13702547 ] Hudson commented on YARN-791: - Integrated in Hadoop-trunk-Commit #4051 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4051/]) YARN-791. Changed RM APIs and web-services related to nodes to ensure that both are consistent with each other. Contributed by Sandy Ryza. (Revision 1500994) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1500994 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetClusterNodesRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/YarnClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/NodeCLI.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetClusterNodesRequestPBImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API - Key: YARN-791 URL: https://issues.apache.org/jira/browse/YARN-791 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Blocker Attachments: YARN-791-1.patch, YARN-791-2.patch, YARN-791-3.patch, YARN-791-4.patch, YARN-791-5.patch, YARN-791-6.patch, YARN-791-7.patch, YARN-791-8.patch, YARN-791.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-791) Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API
[ https://issues.apache.org/jira/browse/YARN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702565#comment-13702565 ] Sandy Ryza commented on YARN-791: - Thanks Vinod! I initially tried something similar to what I now see was done in YARN-727, but wasn't satisfied with it. When you ask for usage you will get something like {code} -listLists all the nodes in the cluster -status Prints the status report of the node -states A list of states to filter on {code} which didn't make sense to me because -states is a parameter to the -list command. Does this seem fine to you? Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API - Key: YARN-791 URL: https://issues.apache.org/jira/browse/YARN-791 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Blocker Fix For: 2.1.0-beta Attachments: YARN-791-1.patch, YARN-791-2.patch, YARN-791-3.patch, YARN-791-4.patch, YARN-791-5.patch, YARN-791-6.patch, YARN-791-7.patch, YARN-791-8.patch, YARN-791.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-295) Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl
[ https://issues.apache.org/jira/browse/YARN-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702572#comment-13702572 ] Hadoop QA commented on YARN-295: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591298/YARN-295-trunk-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1430//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1430//console This message is automatically generated. Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl --- Key: YARN-295 URL: https://issues.apache.org/jira/browse/YARN-295 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.2-alpha, 2.0.1-alpha Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-295-trunk-1.patch, YARN-295-trunk-2.patch, YARN-295-trunk-3.patch {code:xml} 2012-12-28 14:03:56,956 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-904) Enable multiple QOP for ResourceManager
Benoy Antony created YARN-904: - Summary: Enable multiple QOP for ResourceManager Key: YARN-904 URL: https://issues.apache.org/jira/browse/YARN-904 Project: Hadoop YARN Issue Type: Improvement Reporter: Benoy Antony Currently ResourceManager supports only single QOP. The feature makes ResourceManager listen on two ports for RPC. One RPC port supports only authentication , other RPC port supports privacy. Please see HADOOP-9709 for general requirements. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-791) Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API
[ https://issues.apache.org/jira/browse/YARN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702585#comment-13702585 ] Vinod Kumar Vavilapalli commented on YARN-791: -- Yeah, I too saw that. Fundamentally the problem is that hadoop's commands don't follow standard unix patterns. Ideally we should have list, status, states as sub-commands. So you would say yarn application list -all, yarn application list -states RUNNING etc. I should have fixed these when i originally added the yarn CLI, but may be late now. Should be okay like that or we can completely hijack the help message instead of using Apache Common utils. Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API - Key: YARN-791 URL: https://issues.apache.org/jira/browse/YARN-791 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Blocker Fix For: 2.1.0-beta Attachments: YARN-791-1.patch, YARN-791-2.patch, YARN-791-3.patch, YARN-791-4.patch, YARN-791-5.patch, YARN-791-6.patch, YARN-791-7.patch, YARN-791-8.patch, YARN-791.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-791) Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API
[ https://issues.apache.org/jira/browse/YARN-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702595#comment-13702595 ] Sandy Ryza commented on YARN-791: - Totally agree. Filed YARN-905 and we can figure out the best approach there. Ensure that RM RPC APIs that return nodes are consistent with /nodes REST API - Key: YARN-791 URL: https://issues.apache.org/jira/browse/YARN-791 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Blocker Fix For: 2.1.0-beta Attachments: YARN-791-1.patch, YARN-791-2.patch, YARN-791-3.patch, YARN-791-4.patch, YARN-791-5.patch, YARN-791-6.patch, YARN-791-7.patch, YARN-791-8.patch, YARN-791.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-905) Add state filters to nodes CLI
Sandy Ryza created YARN-905: --- Summary: Add state filters to nodes CLI Key: YARN-905 URL: https://issues.apache.org/jira/browse/YARN-905 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza It would be helpful for the nodes CLI to have a node-states option that allows it to return nodes that are not just in the RUNNING state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-905) Add state filters to nodes CLI
[ https://issues.apache.org/jira/browse/YARN-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-905: Assignee: (was: Sandy Ryza) Add state filters to nodes CLI -- Key: YARN-905 URL: https://issues.apache.org/jira/browse/YARN-905 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza It would be helpful for the nodes CLI to have a node-states option that allows it to return nodes that are not just in the RUNNING state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-905) Add state filters to nodes CLI
[ https://issues.apache.org/jira/browse/YARN-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan reassigned YARN-905: Assignee: Wei Yan Add state filters to nodes CLI -- Key: YARN-905 URL: https://issues.apache.org/jira/browse/YARN-905 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Wei Yan It would be helpful for the nodes CLI to have a node-states option that allows it to return nodes that are not just in the RUNNING state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-894) NodeHealthScriptRunner timeout checking is inaccurate on Windows
[ https://issues.apache.org/jira/browse/YARN-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated YARN-894: --- Hadoop Flags: Reviewed +1 for the patch. I'll commit this. I also cannot repro the problem that I saw earlier. I see no obvious file handle leaks in the code. If the problem comes back, we can address it separately. NodeHealthScriptRunner timeout checking is inaccurate on Windows Key: YARN-894 URL: https://issues.apache.org/jira/browse/YARN-894 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.1.0-beta Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Attachments: ReadProcessStdout.java, wait.cmd, wait.sh, YARN-894-trunk.patch In {{NodeHealthScriptRunner}} method, we will set HealthChecker status based on the Shell execution results. Some status are based on the exception thrown during the Shell script execution. Currently, we will catch a non-ExitCodeException from ShellCommandExecutor, and if Shell has the timeout status set at the same time, we will also set HealthChecker status to timeout. We have following execution sequence in Shell: 1) In main thread, schedule a delayed timer task that will kill the original process upon timeout. 2) In main thread, open a buffered reader and feed in the process's standard input stream. 3) When timeout happens, the timer task will call {{Process#destroy()}} to kill the main process. On Linux, when timeout happened and process killed, the buffered reader will thrown an IOException with message: Stream closed in main thread. On Windows, we don't have the IOException. Only -1 was returned from the reader that indicates the buffer is finished. As a result, the timeout status is not set on Windows, and {{TestNodeHealthService}} fails on Windows because of this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-894) NodeHealthScriptRunner timeout checking is inaccurate on Windows
[ https://issues.apache.org/jira/browse/YARN-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702635#comment-13702635 ] Hudson commented on YARN-894: - Integrated in Hadoop-trunk-Commit #4053 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4053/]) YARN-894. NodeHealthScriptRunner timeout checking is inaccurate on Windows. Contributed by Chuan Liu. (Revision 1501016) Result = SUCCESS cnauroth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501016 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthScriptRunner.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeHealthService.java NodeHealthScriptRunner timeout checking is inaccurate on Windows Key: YARN-894 URL: https://issues.apache.org/jira/browse/YARN-894 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.1.0-beta Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Attachments: ReadProcessStdout.java, wait.cmd, wait.sh, YARN-894-trunk.patch In {{NodeHealthScriptRunner}} method, we will set HealthChecker status based on the Shell execution results. Some status are based on the exception thrown during the Shell script execution. Currently, we will catch a non-ExitCodeException from ShellCommandExecutor, and if Shell has the timeout status set at the same time, we will also set HealthChecker status to timeout. We have following execution sequence in Shell: 1) In main thread, schedule a delayed timer task that will kill the original process upon timeout. 2) In main thread, open a buffered reader and feed in the process's standard input stream. 3) When timeout happens, the timer task will call {{Process#destroy()}} to kill the main process. On Linux, when timeout happened and process killed, the buffered reader will thrown an IOException with message: Stream closed in main thread. On Windows, we don't have the IOException. Only -1 was returned from the reader that indicates the buffer is finished. As a result, the timeout status is not set on Windows, and {{TestNodeHealthService}} fails on Windows because of this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable
[ https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702646#comment-13702646 ] Omkar Vinit Joshi commented on YARN-814: * yeah... what I am saying is user can enable it if he wants via log4j. * my bad... in deleteAsUser probably we should simply remove the container message all together. Also exitCode check there can be replaced with logging exception all the time. * replace appId with containerId (locId) in startLocalizer. Difficult to diagnose a failed container launch when error due to invalid environment variable -- Key: YARN-814 URL: https://issues.apache.org/jira/browse/YARN-814 Project: Hadoop YARN Issue Type: Sub-task Reporter: Hitesh Shah Assignee: Jian He Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, YARN-814.4.patch, YARN-814.patch The container's launch script sets up environment variables, symlinks etc. If there is any failure when setting up the basic context ( before the actual user's process is launched ), nothing is captured by the NM. This makes it impossible to diagnose the reason for the failure. To reproduce, set an env var where the value contains characters that throw syntax errors in bash. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-894) NodeHealthScriptRunner timeout checking is inaccurate on Windows
[ https://issues.apache.org/jira/browse/YARN-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated YARN-894: --- Component/s: nodemanager NodeHealthScriptRunner timeout checking is inaccurate on Windows Key: YARN-894 URL: https://issues.apache.org/jira/browse/YARN-894 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.1.0-beta Reporter: Chuan Liu Assignee: Chuan Liu Priority: Minor Fix For: 3.0.0, 2.1.0-beta Attachments: ReadProcessStdout.java, wait.cmd, wait.sh, YARN-894-trunk.patch In {{NodeHealthScriptRunner}} method, we will set HealthChecker status based on the Shell execution results. Some status are based on the exception thrown during the Shell script execution. Currently, we will catch a non-ExitCodeException from ShellCommandExecutor, and if Shell has the timeout status set at the same time, we will also set HealthChecker status to timeout. We have following execution sequence in Shell: 1) In main thread, schedule a delayed timer task that will kill the original process upon timeout. 2) In main thread, open a buffered reader and feed in the process's standard input stream. 3) When timeout happens, the timer task will call {{Process#destroy()}} to kill the main process. On Linux, when timeout happened and process killed, the buffered reader will thrown an IOException with message: Stream closed in main thread. On Windows, we don't have the IOException. Only -1 was returned from the reader that indicates the buffer is finished. As a result, the timeout status is not set on Windows, and {{TestNodeHealthService}} fails on Windows because of this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-366) Add a tracing async dispatcher to simplify debugging
[ https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702689#comment-13702689 ] Sandy Ryza commented on YARN-366: - Rebased onto trunk again Add a tracing async dispatcher to simplify debugging Key: YARN-366 URL: https://issues.apache.org/jira/browse/YARN-366 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366-3.patch, YARN-366-4.patch, YARN-366-5.patch, YARN-366-6.patch, YARN-366.patch Exceptions thrown in YARN/MR code with asynchronous event handling do not contain informative stack traces, as all handle() methods sit directly under the dispatcher thread's loop. This makes errors very difficult to debug for those who are not intimately familiar with the code, as it is difficult to see which chain of events caused a particular outcome. I propose adding an AsyncDispatcher that instruments events with tracing information. Whenever an event is dispatched during the handling of another event, the dispatcher would annotate that event with a pointer to its parent. When the dispatcher catches an exception, it could reconstruct a stack trace of the chain of events that led to it, and be able to log something informative. This would be an experimental feature, off by default, unless extensive testing showed that it did not have a significant performance impact. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-366) Add a tracing async dispatcher to simplify debugging
[ https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-366: Attachment: YARN-366-6.patch Add a tracing async dispatcher to simplify debugging Key: YARN-366 URL: https://issues.apache.org/jira/browse/YARN-366 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366-3.patch, YARN-366-4.patch, YARN-366-5.patch, YARN-366-6.patch, YARN-366.patch Exceptions thrown in YARN/MR code with asynchronous event handling do not contain informative stack traces, as all handle() methods sit directly under the dispatcher thread's loop. This makes errors very difficult to debug for those who are not intimately familiar with the code, as it is difficult to see which chain of events caused a particular outcome. I propose adding an AsyncDispatcher that instruments events with tracing information. Whenever an event is dispatched during the handling of another event, the dispatcher would annotate that event with a pointer to its parent. When the dispatcher catches an exception, it could reconstruct a stack trace of the chain of events that led to it, and be able to log something informative. This would be an experimental feature, off by default, unless extensive testing showed that it did not have a significant performance impact. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM
[ https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702693#comment-13702693 ] Xuan Gong commented on YARN-763: bq.On a different note, serviceStop() should not call join() on the heartbeater thread. While serviceStop() blocks on the join() it may be holding onto application locks in its call tree. The callback thread might be waiting on those locks as it upcalls to the app code. Resulting in a deadlock. However, we should ensure the JVM is not hung because of any issue on this thread. So we should mark the callback thread as a daemon so that the JVM exits even if that thread is running. If we set the callback as daemon thread, calling join() on the heartBeater thread will be fine. AMRMClientAsync should stop heartbeating after receiving shutdown from RM - Key: YARN-763 URL: https://issues.apache.org/jira/browse/YARN-763 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-763.1.patch, YARN-763.2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-727) ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter
[ https://issues.apache.org/jira/browse/YARN-727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-727: --- Attachment: YARN-727.19.patch ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter Key: YARN-727 URL: https://issues.apache.org/jira/browse/YARN-727 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Siddharth Seth Assignee: Xuan Gong Priority: Blocker Attachments: YARN-727.10.patch, YARN-727.11.patch, YARN-727.12.patch, YARN-727.13.patch, YARN-727.14.patch, YARN-727.15.patch, YARN-727.16.patch, YARN-727.17.patch, YARN-727.18.patch, YARN-727.19.patch, YARN-727.1.patch, YARN-727.2.patch, YARN-727.3.patch, YARN-727.4.patch, YARN-727.5.patch, YARN-727.6.patch, YARN-727.7.patch, YARN-727.8.patch, YARN-727.9.patch Now that an ApplicationType is registered on ApplicationSubmission, getAllApplications should be able to use this string to query for a specific application type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-727) ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter
[ https://issues.apache.org/jira/browse/YARN-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702719#comment-13702719 ] Xuan Gong commented on YARN-727: Recreate the patch based on the latest trunk, and address all the latest comments. ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter Key: YARN-727 URL: https://issues.apache.org/jira/browse/YARN-727 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Siddharth Seth Assignee: Xuan Gong Priority: Blocker Attachments: YARN-727.10.patch, YARN-727.11.patch, YARN-727.12.patch, YARN-727.13.patch, YARN-727.14.patch, YARN-727.15.patch, YARN-727.16.patch, YARN-727.17.patch, YARN-727.18.patch, YARN-727.19.patch, YARN-727.1.patch, YARN-727.2.patch, YARN-727.3.patch, YARN-727.4.patch, YARN-727.5.patch, YARN-727.6.patch, YARN-727.7.patch, YARN-727.8.patch, YARN-727.9.patch Now that an ApplicationType is registered on ApplicationSubmission, getAllApplications should be able to use this string to query for a specific application type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-661) NM fails to cleanup local directories for users
[ https://issues.apache.org/jira/browse/YARN-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-661: --- Attachment: YARN-661-20130708.patch NM fails to cleanup local directories for users --- Key: YARN-661 URL: https://issues.apache.org/jira/browse/YARN-661 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-beta, 0.23.8 Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Attachments: YARN-661-20130701.patch, YARN-661-20130708.patch YARN-71 added deletion of local directories on startup, but in practice it fails to delete the directories because of permission problems. The top-level usercache directory is owned by the user but is in a directory that is not writable by the user. Therefore the deletion of the user's usercache directory, as the user, fails due to lack of permissions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-347) YARN node CLI should also show CPU info as memory info in node status
[ https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-347: Attachment: YARN-347-v2.patch YARN node CLI should also show CPU info as memory info in node status - Key: YARN-347 URL: https://issues.apache.org/jira/browse/YARN-347 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Junping Du Assignee: Junping Du Attachments: YARN-347.patch, YARN-347-v2.patch With YARN-2 checked in, CPU info are taken into consideration in resource scheduling. yarn node -status NodeID should show CPU used and capacity info as memory info. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-347) YARN node CLI should also show CPU info as memory info in node status
[ https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702723#comment-13702723 ] Junping Du commented on YARN-347: - Sure. Rebase this patch against trunk in v2 patch. YARN node CLI should also show CPU info as memory info in node status - Key: YARN-347 URL: https://issues.apache.org/jira/browse/YARN-347 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Junping Du Assignee: Junping Du Attachments: YARN-347.patch, YARN-347-v2.patch With YARN-2 checked in, CPU info are taken into consideration in resource scheduling. yarn node -status NodeID should show CPU used and capacity info as memory info. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM
[ https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-763: --- Attachment: YARN-763.3.patch AMRMClientAsync should stop heartbeating after receiving shutdown from RM - Key: YARN-763 URL: https://issues.apache.org/jira/browse/YARN-763 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-366) Add a tracing async dispatcher to simplify debugging
[ https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702753#comment-13702753 ] Hadoop QA commented on YARN-366: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591331/YARN-366-6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager org.apache.hadoop.yarn.server.nodemanager.TestEventFlow org.apache.hadoop.yarn.server.TestContainerManagerSecurity org.apache.hadoop.yarn.server.TestDiskFailures {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1431//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1431//console This message is automatically generated. Add a tracing async dispatcher to simplify debugging Key: YARN-366 URL: https://issues.apache.org/jira/browse/YARN-366 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366-3.patch, YARN-366-4.patch, YARN-366-5.patch, YARN-366-6.patch, YARN-366.patch Exceptions thrown in YARN/MR code with asynchronous event handling do not contain informative stack traces, as all handle() methods sit directly under the dispatcher thread's loop. This makes errors very difficult to debug for those who are not intimately familiar with the code, as it is difficult to see which chain of events caused a particular outcome. I propose adding an AsyncDispatcher that instruments events with tracing information. Whenever an event is dispatched during the handling of another event, the dispatcher would annotate that event with a pointer to its parent. When the dispatcher catches an exception, it could reconstruct a stack trace of the chain of events that led to it, and be able to log something informative. This would be an experimental feature, off by default, unless extensive testing showed that it did not have a significant performance impact. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM
[ https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702780#comment-13702780 ] Hadoop QA commented on YARN-763: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591349/YARN-763.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.api.async.impl.TestAMRMClientAsync {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1432//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1432//console This message is automatically generated. AMRMClientAsync should stop heartbeating after receiving shutdown from RM - Key: YARN-763 URL: https://issues.apache.org/jira/browse/YARN-763 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-661) NM fails to cleanup local directories for users
[ https://issues.apache.org/jira/browse/YARN-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702788#comment-13702788 ] Hadoop QA commented on YARN-661: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591342/YARN-661-20130708.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1434//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1434//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1434//console This message is automatically generated. NM fails to cleanup local directories for users --- Key: YARN-661 URL: https://issues.apache.org/jira/browse/YARN-661 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-beta, 0.23.8 Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Attachments: YARN-661-20130701.patch, YARN-661-20130708.patch YARN-71 added deletion of local directories on startup, but in practice it fails to delete the directories because of permission problems. The top-level usercache directory is owned by the user but is in a directory that is not writable by the user. Therefore the deletion of the user's usercache directory, as the user, fails due to lack of permissions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-347) YARN node CLI should also show CPU info as memory info in node status
[ https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702792#comment-13702792 ] Hadoop QA commented on YARN-347: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591343/YARN-347-v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.api.impl.TestNMClient {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1435//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1435//console This message is automatically generated. YARN node CLI should also show CPU info as memory info in node status - Key: YARN-347 URL: https://issues.apache.org/jira/browse/YARN-347 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Junping Du Assignee: Junping Du Attachments: YARN-347.patch, YARN-347-v2.patch With YARN-2 checked in, CPU info are taken into consideration in resource scheduling. yarn node -status NodeID should show CPU used and capacity info as memory info. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM
[ https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-763: --- Attachment: YARN-763.4.patch AMRMClientAsync should stop heartbeating after receiving shutdown from RM - Key: YARN-763 URL: https://issues.apache.org/jira/browse/YARN-763 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch, YARN-763.4.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM
[ https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-763: --- Attachment: YARN-763.5.patch AMRMClientAsync should stop heartbeating after receiving shutdown from RM - Key: YARN-763 URL: https://issues.apache.org/jira/browse/YARN-763 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch, YARN-763.4.patch, YARN-763.5.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM
[ https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702799#comment-13702799 ] Xuan Gong commented on YARN-763: fix the test case failure AMRMClientAsync should stop heartbeating after receiving shutdown from RM - Key: YARN-763 URL: https://issues.apache.org/jira/browse/YARN-763 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch, YARN-763.4.patch, YARN-763.5.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-727) ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter
[ https://issues.apache.org/jira/browse/YARN-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702812#comment-13702812 ] Hadoop QA commented on YARN-727: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591339/YARN-727.19.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1433//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1433//console This message is automatically generated. ClientRMProtocol.getAllApplications should accept ApplicationType as a parameter Key: YARN-727 URL: https://issues.apache.org/jira/browse/YARN-727 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Siddharth Seth Assignee: Xuan Gong Priority: Blocker Attachments: YARN-727.10.patch, YARN-727.11.patch, YARN-727.12.patch, YARN-727.13.patch, YARN-727.14.patch, YARN-727.15.patch, YARN-727.16.patch, YARN-727.17.patch, YARN-727.18.patch, YARN-727.19.patch, YARN-727.1.patch, YARN-727.2.patch, YARN-727.3.patch, YARN-727.4.patch, YARN-727.5.patch, YARN-727.6.patch, YARN-727.7.patch, YARN-727.8.patch, YARN-727.9.patch Now that an ApplicationType is registered on ApplicationSubmission, getAllApplications should be able to use this string to query for a specific application type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM
[ https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702821#comment-13702821 ] Hadoop QA commented on YARN-763: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591357/YARN-763.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1436//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1436//console This message is automatically generated. AMRMClientAsync should stop heartbeating after receiving shutdown from RM - Key: YARN-763 URL: https://issues.apache.org/jira/browse/YARN-763 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch, YARN-763.4.patch, YARN-763.5.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-592) Container logs lost for the application when NM gets restarted
[ https://issues.apache.org/jira/browse/YARN-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702823#comment-13702823 ] Omkar Vinit Joshi commented on YARN-592: I just looked at your patch I need more information to understand it better * are you assuming that after nm restarts application for which containers were running on that node manager will again get new container on the same node manager? at present NM doesn't remember the applications which were running on it across restart. Also RM doesn't inform NM about all the running applications in the cluster. * Now across NM restart applications might be still running or it might have just finished before restart. Do you want to upload the logs for both scenarios? at present we upload logs only when application finishes... Container logs lost for the application when NM gets restarted -- Key: YARN-592 URL: https://issues.apache.org/jira/browse/YARN-592 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.1-alpha, 2.0.3-alpha Reporter: Devaraj K Assignee: Devaraj K Priority: Critical Attachments: YARN-592.patch While running a big job if the NM goes down due to some reason and comes back, it will do the log aggregation for the newly launched containers and deletes all the containers for the application. This case we don't get the container logs from HDFS or local for the containers which are launched before restart and completed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-644) Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer
[ https://issues.apache.org/jira/browse/YARN-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702870#comment-13702870 ] Joseph Kniest commented on YARN-644: I'm new to mapreduce2/yarn dev and am unfamiliar yet with the source, would this simply be exiting the function with a return upon null object detection? Basic null check is not performed on passed in arguments before using them in ContainerManagerImpl.startContainer - Key: YARN-644 URL: https://issues.apache.org/jira/browse/YARN-644 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Priority: Minor I see that validation/ null check is not performed on passed in parameters. Ex. tokenId.getContainerID().getApplicationAttemptId() inside ContainerManagerImpl.authorizeRequest() I guess we should add these checks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-299) Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE
[ https://issues.apache.org/jira/browse/YARN-299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-299: --- Attachment: YARN-299-trunk-2.patch Updating the patch. Thanks, Mayank Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE --- Key: YARN-299 URL: https://issues.apache.org/jira/browse/YARN-299 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.0.1-alpha, 2.0.0-alpha Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-299-trunk-1.patch, YARN-299-trunk-2.patch {code:xml} 2012-12-31 10:36:27,844 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Can't handle this event at current state: Current: [DONE], eventType: [RESOURCE_FAILED] org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:819) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:71) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:504) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:497) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) 2012-12-31 10:36:27,845 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1356792558130_0002_01_01 transitioned from DONE to null {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-299) Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE
[ https://issues.apache.org/jira/browse/YARN-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702918#comment-13702918 ] Hadoop QA commented on YARN-299: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591378/YARN-299-trunk-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1437//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1437//console This message is automatically generated. Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE --- Key: YARN-299 URL: https://issues.apache.org/jira/browse/YARN-299 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.0.1-alpha, 2.0.0-alpha Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-299-trunk-1.patch, YARN-299-trunk-2.patch {code:xml} 2012-12-31 10:36:27,844 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Can't handle this event at current state: Current: [DONE], eventType: [RESOURCE_FAILED] org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:819) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:71) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:504) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:497) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) 2012-12-31 10:36:27,845 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1356792558130_0002_01_01 transitioned from DONE to null {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception
[ https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702933#comment-13702933 ] Xuan Gong commented on YARN-875: For the callback, we can catch Throwable, and call handler.onError(Exception). This will tell ApplicationMaster to jump out of loop, and go into finish function. And eventually, AMRMClientAsync will call unregisterApplicationMaster and set keepRunning flag to false which will stop the heartBeat thread. But we can let HeartBeat thread stop a little bit earlier. Option one : inside the catch block, we can call heartBeatThread.interrupt() and set keepRunning = false Option two : we define a volatile Exception savedCallBackException, inside the catch block, we can set savedCallBackException, and inside heartBeatThread.run(), before we do the allocate(), we alway check whether savedCallBackException is null. [~bikassaha] anu other suggestions ? Application can hang if AMRMClientAsync callback thread has exception - Key: YARN-875 URL: https://issues.apache.org/jira/browse/YARN-875 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Xuan Gong Currently that thread will die and then never callback. App can hang. Possible solution could be to catch Throwable in the callback and then call client.onError(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report
[ https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702938#comment-13702938 ] Xuan Gong commented on YARN-873: [~bikassaha] any comments YARNClient.getApplicationReport(unknownAppId) returns a null report --- Key: YARN-873 URL: https://issues.apache.org/jira/browse/YARN-873 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Xuan Gong How can the client find out that app does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-643) WHY appToken is removed both in BaseFinalTransition and AMUnregisteredTransition AND clientToken is removed in FinalTransition and not BaseFinalTransition
[ https://issues.apache.org/jira/browse/YARN-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702940#comment-13702940 ] Xuan Gong commented on YARN-643: [~vinodkv] any comments ? WHY appToken is removed both in BaseFinalTransition and AMUnregisteredTransition AND clientToken is removed in FinalTransition and not BaseFinalTransition -- Key: YARN-643 URL: https://issues.apache.org/jira/browse/YARN-643 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Xuan Gong -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-296) Resource Manager throws InvalidStateTransitonException: Invalid event: APP_ACCEPTED at RUNNING for RMAppImpl
[ https://issues.apache.org/jira/browse/YARN-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702942#comment-13702942 ] Mayank Bansal commented on YARN-296: Thanks [~zjshen] and [~ojoshi] for the review I think [~vinodkv] mentioned here can cause this. https://issues.apache.org/jira/browse/YARN-295?focusedCommentId=13675472page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13675472 Thoughts? Thanks, Mayank Resource Manager throws InvalidStateTransitonException: Invalid event: APP_ACCEPTED at RUNNING for RMAppImpl Key: YARN-296 URL: https://issues.apache.org/jira/browse/YARN-296 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.2-alpha, 2.0.1-alpha Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-296-trunk-1.patch, YARN-296-trunk-2.patch {code:xml} 2012-12-28 11:14:47,671 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: APP_ACCEPTED at RUNNING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:528) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:72) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:405) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:389) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira