[jira] [Assigned] (YARN-506) Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute
[ https://issues.apache.org/jira/browse/YARN-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic reassigned YARN-506: --- Assignee: Ivan Mitic Move to common utils FileUtil#setReadable/Writable/Executable and FileUtil#canRead/Write/Execute Key: YARN-506 URL: https://issues.apache.org/jira/browse/YARN-506 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Ivan Mitic Assignee: Ivan Mitic Attachments: YARN-506.commonfileutils.patch Move to common utils described in HADOOP-9413 that work well cross-platform. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-193: - Attachment: YARN-193.8.patch Had offline discussion with Bikas and Hitesh. We agreed to simplify the solution, and isolate it from the fix of YARN-382. Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-112) Race in localization can cause containers to fail
[ https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616325#comment-13616325 ] Robert Joseph Evans commented on YARN-112: -- I agree that scale exposes races but, still the underlying problem is that we want to create a new unique directory. This seems very simple. {code} File uniqueDir = null; do { uniqueDir = new File(baseDir, String.valueOf(rand.nextLong())); } while (!uniqueDir.mkdir()); {code} I don't see why we are going through all of this complexity, simply because a FileContext API is broken. Playing games to make the race less likely is fine. But ultimately we still have to handle the race. Race in localization can cause containers to fail - Key: YARN-112 URL: https://issues.apache.org/jira/browse/YARN-112 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, yarn-112-20130326.patch, yarn-112.20131503.patch On one of our 0.23 clusters, I saw a case of two containers, corresponding to two map tasks of a MR job, that were launched almost simultaneously on the same node. It appears they both tried to localize job.jar and job.xml at the same time. One of the containers failed when it couldn't rename the temporary job.jar directory to its final name because the target directory wasn't empty. Shortly afterwards the second container failed because job.xml could not be found, presumably because the first container removed it when it cleaned up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-112) Race in localization can cause containers to fail
[ https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616327#comment-13616327 ] Robert Joseph Evans commented on YARN-112: -- Oh and the latest patch using a unique number will not always work, because the same code is used from different processes on the same box. We would have to have a way to guarantee uniqueness between the different processes. CurrentTimeMillis helps but still could result in a race. Race in localization can cause containers to fail - Key: YARN-112 URL: https://issues.apache.org/jira/browse/YARN-112 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, yarn-112-20130326.patch, yarn-112.20131503.patch On one of our 0.23 clusters, I saw a case of two containers, corresponding to two map tasks of a MR job, that were launched almost simultaneously on the same node. It appears they both tried to localize job.jar and job.xml at the same time. One of the containers failed when it couldn't rename the temporary job.jar directory to its final name because the target directory wasn't empty. Shortly afterwards the second container failed because job.xml could not be found, presumably because the first container removed it when it cleaned up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-512) Log aggregation root directory check is more expensive than it needs to be
Jason Lowe created YARN-512: --- Summary: Log aggregation root directory check is more expensive than it needs to be Key: YARN-512 URL: https://issues.apache.org/jira/browse/YARN-512 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.5-beta Reporter: Jason Lowe Priority: Minor The log aggregation root directory check first does an {{exists}} call followed by a {{getFileStatus}} call. That effectively stats the file twice. It should just use {{getFileStatus}} and catch {{FileNotFoundException}} to handle the non-existent case. In addition we may consider caching the presence of the directory rather than checking it each time a node aggregates logs for an application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-509) ResourceTrackerPB misses KerberosInfo annotation which renders YARN unusable on secure clusters
[ https://issues.apache.org/jira/browse/YARN-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616415#comment-13616415 ] Roman Shaposhnik commented on YARN-509: --- I totally agree that it needs to be investigated. That said, if we have to rush 2.0.4-alpha I'd say the proposed patch might be a reasonable workaround. ResourceTrackerPB misses KerberosInfo annotation which renders YARN unusable on secure clusters --- Key: YARN-509 URL: https://issues.apache.org/jira/browse/YARN-509 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.1-alpha Environment: BigTop Kerberized cluster test environment Reporter: Konstantin Boudnik Priority: Blocker Fix For: 3.0.0, 2.0.4-alpha Attachments: YARN-509.patch.txt During BigTop 0.6.0 release test cycle, [~rvs] came around the following problem: {noformat} 013-03-26 15:37:03,573 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager org.apache.hadoop.yarn.YarnException: Failed to Start org.apache.hadoop.yarn.server.nodemanager.NodeManager at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:199) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:322) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:359) Caused by: org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:162) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) ... 3 more Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:128) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:61) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:199) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:158) ... 4 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User yarn/ip-10-46-37-244.ec2.internal@BIGTOP (auth:KERBEROS) is not authorized for protocol interface org.apache.hadoop.yarn.server.api.ResourceTrackerPB, expected client Kerberos principal is yarn/ip-10-46-37-244.ec2.internal@BIGTOP at org.apache.hadoop.ipc.Client.call(Client.java:1235) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at $Proxy26.registerNodeManager(Unknown Source) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59) ... 6 more {noformat} The most significant part is {{User yarn/ip-10-46-37-244.ec2.internal@BIGTOP (auth:KERBEROS) is not authorized for protocol interface org.apache.hadoop.yarn.server.api.ResourceTrackerPB}} indicating that ResourceTrackerPB hasn't been annotated with {{@KerberosInfo}} nor {{@TokenInfo}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-112) Race in localization can cause containers to fail
[ https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616447#comment-13616447 ] Vinod Kumar Vavilapalli commented on YARN-112: -- bq. Playing games to make the race less likely is fine. But ultimately we still have to handle the race. bq. Oh and the latest patch using a unique number will not always work, because the same code is used from different processes on the same box. Bobby, the unique number generation is done in one single process and communicated down. ResourceTrackerService (NodeManager process) generates the unique path and passes it down to FSDownload (Localizer process), so we can avoid the race altogether. Race in localization can cause containers to fail - Key: YARN-112 URL: https://issues.apache.org/jira/browse/YARN-112 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3 Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, yarn-112-20130326.patch, yarn-112.20131503.patch On one of our 0.23 clusters, I saw a case of two containers, corresponding to two map tasks of a MR job, that were launched almost simultaneously on the same node. It appears they both tried to localize job.jar and job.xml at the same time. One of the containers failed when it couldn't rename the temporary job.jar directory to its final name because the target directory wasn't empty. Shortly afterwards the second container failed because job.xml could not be found, presumably because the first container removed it when it cleaned up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-450) Define value for * in the scheduling protocol
[ https://issues.apache.org/jira/browse/YARN-450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-450: - Attachment: YARN-450_7.patch isAnyLocation returns boolean instead of Boolean now. Define value for * in the scheduling protocol - Key: YARN-450 URL: https://issues.apache.org/jira/browse/YARN-450 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Zhijie Shen Attachments: YARN-450_1.patch, YARN-450_2.patch, YARN-450_3.patch, YARN-450_4.patch, YARN-450_5.patch, YARN-450_6.patch, YARN-450_7.patch The ResourceRequest has a string field to specify node/rack locations. For the cross-rack/cluster-wide location (ie when there is no locality constraint) the * string is used everywhere. However, its not defined anywhere and each piece of code either defines a local constant or uses the string literal. Defining * in the protocol and removing other local references from the code base will be good. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-248) Restore RMDelegationTokenSecretManager state on restart
[ https://issues.apache.org/jira/browse/YARN-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-248: Assignee: Bikas Saha Restore RMDelegationTokenSecretManager state on restart --- Key: YARN-248 URL: https://issues.apache.org/jira/browse/YARN-248 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tom White Assignee: Bikas Saha On restart, the RM creates a new RMDelegationTokenSecretManager with fresh state. This will cause problems for Oozie jobs running on secure clusters since the delegation tokens stored in the job credentials (used by the Oozie launcher job to submit a job to the RM) will not be recognized by the RM, and recovery will fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-513) Verify all clients will wait for RM to restart
Bikas Saha created YARN-513: --- Summary: Verify all clients will wait for RM to restart Key: YARN-513 URL: https://issues.apache.org/jira/browse/YARN-513 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: jian he When the RM is restarting, the NM, AM and Clients should wait for some time for the RM to come back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-514) Delayed store operations should not result in RM unavailability for app submission
Bikas Saha created YARN-514: --- Summary: Delayed store operations should not result in RM unavailability for app submission Key: YARN-514 URL: https://issues.apache.org/jira/browse/YARN-514 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Bikas Saha Currently, app submission is the only store operation performed synchronously because the app must be stored before the request returns with success. This makes the RM susceptible to blocking all client threads on slow store operations, resulting in RM being perceived as unavailable by clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-437) Update documentation of Writing Yarn Applications to match current best practices
[ https://issues.apache.org/jira/browse/YARN-437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated YARN-437: - Attachment: YARN-437-3.patch Added a couple small points I missed before, and fixed (I hope) some formatting. Thanks. If anyone notices any misuses of the document formatting please let me know. Thanks! Update documentation of Writing Yarn Applications to match current best practices --- Key: YARN-437 URL: https://issues.apache.org/jira/browse/YARN-437 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Hitesh Shah Assignee: Eli Reisman Attachments: YARN-437-1.patch, YARN-437-2.patch, YARN-437-3.patch Should fix docs to point to usage of YarnClient and AMRMClient helper libs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616558#comment-13616558 ] Vinod Kumar Vavilapalli commented on YARN-467: -- Another thing I've been looking hard is to see if LocalResourceTracker.localizationCompleted() can be done away with completely in favour of the handle() method. But to do that we need to handle both successful and failing localizations via handle(). I can already see a couple of bugs related to localization failures, so let's do this separately. Jobs fail during resource localization when public distributed-cache hits unix directory limits --- Key: YARN-467 URL: https://issues.apache.org/jira/browse/YARN-467 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, yarn-467-20130322.3.patch, yarn-467-20130322.patch, yarn-467-20130325.1.patch, yarn-467-20130325.path If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache (PUBLIC). The jobs start failing with the below exception. java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) we need to have a mechanism where in we can create directory hierarchy and limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-515) Node Manager not getting the master key
Robert Joseph Evans created YARN-515: Summary: Node Manager not getting the master key Key: YARN-515 URL: https://issues.apache.org/jira/browse/YARN-515 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.4-alpha Reporter: Robert Joseph Evans Priority: Blocker On branch-2 the latest version I see the following on a secure cluster. {noformat} 2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Security enabled - updating secret keys now 2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as RM:PORT with total resource of me mory:12288, vCores:16 2013-03-28 19:21:06,244 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is started. 2013-03-28 19:21:06,245 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started. 2013-03-28 19:21:07,257 [Node Status Updater] ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught exception in status-updater java.lang.NullPointerException at org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:121) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:407) {noformat} The Null pointer exception just keeps repeating and all of the nodes end up being lost. It looks like it never gets the secret key when it registers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-450) Define value for * in the scheduling protocol
[ https://issues.apache.org/jira/browse/YARN-450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616584#comment-13616584 ] Bikas Saha commented on YARN-450: - +1. Committed to trunk and branch-2. Thanks Zhijie! Define value for * in the scheduling protocol - Key: YARN-450 URL: https://issues.apache.org/jira/browse/YARN-450 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Zhijie Shen Attachments: YARN-450_1.patch, YARN-450_2.patch, YARN-450_3.patch, YARN-450_4.patch, YARN-450_5.patch, YARN-450_6.patch, YARN-450_7.patch The ResourceRequest has a string field to specify node/rack locations. For the cross-rack/cluster-wide location (ie when there is no locality constraint) the * string is used everywhere. However, its not defined anywhere and each piece of code either defines a local constant or uses the string literal. Defining * in the protocol and removing other local references from the code base will be good. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-515) Node Manager not getting the master key
[ https://issues.apache.org/jira/browse/YARN-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616628#comment-13616628 ] Robert Joseph Evans commented on YARN-515: -- OK It actually looks like the NM is trying to get the Master Key, before it ever has set it, which is causing the NPE. Node Manager not getting the master key --- Key: YARN-515 URL: https://issues.apache.org/jira/browse/YARN-515 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.4-alpha Reporter: Robert Joseph Evans Priority: Blocker On branch-2 the latest version I see the following on a secure cluster. {noformat} 2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Security enabled - updating secret keys now 2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as RM:PORT with total resource of me mory:12288, vCores:16 2013-03-28 19:21:06,244 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is started. 2013-03-28 19:21:06,245 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started. 2013-03-28 19:21:07,257 [Node Status Updater] ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught exception in status-updater java.lang.NullPointerException at org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:121) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:407) {noformat} The Null pointer exception just keeps repeating and all of the nodes end up being lost. It looks like it never gets the secret key when it registers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-509) $var shell substitution in properties are not expanded in hadoop-policy.xml
[ https://issues.apache.org/jira/browse/YARN-509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Shaposhnik updated YARN-509: -- Summary: $var shell substitution in properties are not expanded in hadoop-policy.xml (was: ResourceTrackerPB misses KerberosInfo annotation which renders YARN unusable on secure clusters) $var shell substitution in properties are not expanded in hadoop-policy.xml --- Key: YARN-509 URL: https://issues.apache.org/jira/browse/YARN-509 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.1-alpha Environment: BigTop Kerberized cluster test environment Reporter: Konstantin Boudnik Priority: Blocker Fix For: 3.0.0, 2.0.4-alpha Attachments: YARN-509.patch.txt During BigTop 0.6.0 release test cycle, [~rvs] came around the following problem: {noformat} 013-03-26 15:37:03,573 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager org.apache.hadoop.yarn.YarnException: Failed to Start org.apache.hadoop.yarn.server.nodemanager.NodeManager at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:199) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:322) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:359) Caused by: org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:162) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) ... 3 more Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:128) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:61) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:199) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:158) ... 4 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User yarn/ip-10-46-37-244.ec2.internal@BIGTOP (auth:KERBEROS) is not authorized for protocol interface org.apache.hadoop.yarn.server.api.ResourceTrackerPB, expected client Kerberos principal is yarn/ip-10-46-37-244.ec2.internal@BIGTOP at org.apache.hadoop.ipc.Client.call(Client.java:1235) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at $Proxy26.registerNodeManager(Unknown Source) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59) ... 6 more {noformat} The most significant part is {{User yarn/ip-10-46-37-244.ec2.internal@BIGTOP (auth:KERBEROS) is not authorized for protocol interface org.apache.hadoop.yarn.server.api.ResourceTrackerPB}} indicating that ResourceTrackerPB hasn't been annotated with {{@KerberosInfo}} nor {{@TokenInfo}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-509) $var shell substitution in properties are not expanded in hadoop-policy.xml
[ https://issues.apache.org/jira/browse/YARN-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616685#comment-13616685 ] Roman Shaposhnik commented on YARN-509: --- Guys, I've updated the the description of the JIRA to be better reflect the latest findings. I'm leaving it as a blocker for now expecting somebody else to chime in and propose whether we apply a patch I provide or RELNOTE this if there's not enough time to get to the bottom of the issue. $var shell substitution in properties are not expanded in hadoop-policy.xml --- Key: YARN-509 URL: https://issues.apache.org/jira/browse/YARN-509 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.1-alpha Environment: BigTop Kerberized cluster test environment Reporter: Konstantin Boudnik Priority: Blocker Fix For: 3.0.0, 2.0.4-alpha Attachments: YARN-509.patch.txt During BigTop 0.6.0 release test cycle, [~rvs] came around the following problem: {noformat} 013-03-26 15:37:03,573 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager org.apache.hadoop.yarn.YarnException: Failed to Start org.apache.hadoop.yarn.server.nodemanager.NodeManager at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:199) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:322) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:359) Caused by: org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:162) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) ... 3 more Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:128) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:61) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:199) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:158) ... 4 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User yarn/ip-10-46-37-244.ec2.internal@BIGTOP (auth:KERBEROS) is not authorized for protocol interface org.apache.hadoop.yarn.server.api.ResourceTrackerPB, expected client Kerberos principal is yarn/ip-10-46-37-244.ec2.internal@BIGTOP at org.apache.hadoop.ipc.Client.call(Client.java:1235) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at $Proxy26.registerNodeManager(Unknown Source) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59) ... 6 more {noformat} The most significant part is {{User yarn/ip-10-46-37-244.ec2.internal@BIGTOP (auth:KERBEROS) is not authorized for protocol interface org.apache.hadoop.yarn.server.api.ResourceTrackerPB}} indicating that ResourceTrackerPB hasn't been annotated with {{@KerberosInfo}} nor {{@TokenInfo}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-515) Node Manager not getting the master key
[ https://issues.apache.org/jira/browse/YARN-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616698#comment-13616698 ] Robert Joseph Evans commented on YARN-515: -- This is really odd. I put in logging in the ResourceTrackerService and in the NodeStatusUpdaterImpl. The RM sets the secret key in the RegisterNodeManagerResponse, but the NM only sees a null come out for it. Because of that the heartbeat always fails with the NPE trying to read something that was never set. Node Manager not getting the master key --- Key: YARN-515 URL: https://issues.apache.org/jira/browse/YARN-515 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.4-alpha Reporter: Robert Joseph Evans Priority: Blocker On branch-2 the latest version I see the following on a secure cluster. {noformat} 2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Security enabled - updating secret keys now 2013-03-28 19:21:06,243 [main] INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as RM:PORT with total resource of me mory:12288, vCores:16 2013-03-28 19:21:06,244 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is started. 2013-03-28 19:21:06,245 [main] INFO org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started. 2013-03-28 19:21:07,257 [Node Status Updater] ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Caught exception in status-updater java.lang.NullPointerException at org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:121) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:407) {noformat} The Null pointer exception just keeps repeating and all of the nodes end up being lost. It looks like it never gets the secret key when it registers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616708#comment-13616708 ] Zhijie Shen commented on YARN-276: -- IMO, the essential problem is that maxActiveApplications is a loose bound. See the formular bellow. 1. clusterResource * maximumApplicationMasterResourcePercent = minAllocation * maxActiveApplications. maxActiveApplications is computed by assuming each application only requires minAllocation. In fact, AM container may require more. Therefore, 2. clusterResource * maximumApplicationMasterResourcePercent = minAllocation * maxActiveApplications = (minAllocation_1 + minAllocation_2 + ... + minAllocation_k) = (requestedResource_1 + requestedResource_2 + ... + minAllocation_k), where k = maxActiveApplications. Hence when maxActiveApplications applications are activated and they require more than minAllocation resource, such that more than maximumApplicationMasterResourcePercent of clusterResource may be used by AMs, and even clusterResource is likely to be exceeded. @nemon's solution looks good, which is actually a more restrict bound of the max allowed active applications. Whenever an application is to be activated, the following criteria is checked. 3. clusterResource * maximumApplicationMasterResourcePercent - ApplicationMasterResource = requestedResource. The issue here is that when this criteria is met, maxActiveApplications should be met as well, because this one is more restricted. So instead of add the new criteria, how about replacing maxActiveApplications with it? Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment
[ https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah reassigned YARN-475: Assignee: Hitesh Shah Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment --- Key: YARN-475 URL: https://issues.apache.org/jira/browse/YARN-475 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive the application attempt id from the container id. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment
[ https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-475: - Issue Type: Sub-task (was: Bug) Parent: YARN-386 Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment --- Key: YARN-475 URL: https://issues.apache.org/jira/browse/YARN-475 Project: Hadoop YARN Issue Type: Sub-task Reporter: Hitesh Shah Assignee: Hitesh Shah AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive the application attempt id from the container id. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-435) Make it easier to access cluster topology information in an AM
[ https://issues.apache.org/jira/browse/YARN-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-435: - Issue Type: Sub-task (was: Bug) Parent: YARN-386 Make it easier to access cluster topology information in an AM -- Key: YARN-435 URL: https://issues.apache.org/jira/browse/YARN-435 Project: Hadoop YARN Issue Type: Sub-task Reporter: Hitesh Shah ClientRMProtocol exposes a getClusterNodes api that provides a report on all nodes in the cluster including their rack information. However, this requires the AM to open and establish a separate connection to the RM in addition to one for the AMRMProtocol. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment
[ https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-475: - Attachment: YARN-475.1.patch Trivial patch - no tests. Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment --- Key: YARN-475 URL: https://issues.apache.org/jira/browse/YARN-475 Project: Hadoop YARN Issue Type: Sub-task Reporter: Hitesh Shah Assignee: Hitesh Shah Attachments: YARN-475.1.patch AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive the application attempt id from the container id. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-209) Capacity scheduler doesn't trigger app-activation after adding nodes
[ https://issues.apache.org/jira/browse/YARN-209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616790#comment-13616790 ] Hudson commented on YARN-209: - Integrated in Hadoop-trunk-Commit #3537 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3537/]) YARN-209. Fix CapacityScheduler to trigger application-activation when the cluster capacity changes. Contributed by Zhijie Shen. (Revision 1461773) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1461773 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java Capacity scheduler doesn't trigger app-activation after adding nodes Key: YARN-209 URL: https://issues.apache.org/jira/browse/YARN-209 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Bikas Saha Assignee: Zhijie Shen Fix For: 2.0.5-beta Attachments: YARN-209.1.patch, YARN-209.2.patch, YARN-209.3.patch, YARN-209.4.patch, YARN-209-test.patch Say application A is submitted but at that time it does not meet the bar for activation because of resource limit settings for applications. After that if more hardware is added to the system and the application becomes valid it still remains in pending state, likely forever. This might be rare to hit in real life because enough NM's heartbeat to the RM before applications can get submitted. But a change in settings or heartbeat interval might make it easier to repro. In RM restart scenarios, this will likely hit more if its implemented by re-playing events and re-submitting applications to the scheduler before the RPC to NM's is activated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-450) Define value for * in the scheduling protocol
[ https://issues.apache.org/jira/browse/YARN-450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616792#comment-13616792 ] Hudson commented on YARN-450: - Integrated in Hadoop-trunk-Commit #3537 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3537/]) YARN-450. Define value for * in the scheduling protocol (Zhijie Shen via bikas) (Revision 1462271) Result = SUCCESS bikas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1462271 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerRequestor.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRAppBenchmark.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/AMRMClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/AMRMClientImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestAMRMClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Task.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java *
[jira] [Commented] (YARN-24) Nodemanager fails to start if log aggregation enabled and namenode unavailable
[ https://issues.apache.org/jira/browse/YARN-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616793#comment-13616793 ] Hudson commented on YARN-24: Integrated in Hadoop-trunk-Commit #3537 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3537/]) YARN-24. Nodemanager fails to start if log aggregation enabled and namenode unavailable. (sandyr via tucu) (Revision 1461891) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1461891 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java Nodemanager fails to start if log aggregation enabled and namenode unavailable -- Key: YARN-24 URL: https://issues.apache.org/jira/browse/YARN-24 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 0.23.3, 2.0.0-alpha Reporter: Jason Lowe Assignee: Sandy Ryza Fix For: 2.0.5-beta Attachments: YARN-24-1.patch, YARN-24-2.patch, YARN-24-3.patch, YARN-24.patch If log aggregation is enabled and the namenode is currently unavailable, the nodemanager fails to startup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616804#comment-13616804 ] Hadoop QA commented on YARN-309: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575825/YARN-309.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/620//console This message is automatically generated. Make RM provide heartbeat interval to NM Key: YARN-309 URL: https://issues.apache.org/jira/browse/YARN-309 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-309.1.patch, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, YARN-309.5.patch, YARN-309.6.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-477) MiniYARNCluster: When container executor script fails to launch App Master, NM logs error, but Client doesn't get signaled to kill the job
[ https://issues.apache.org/jira/browse/YARN-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616817#comment-13616817 ] Vinod Kumar Vavilapalli commented on YARN-477: -- Eli, please reopen the ticket if you run into this again. Tx. MiniYARNCluster: When container executor script fails to launch App Master, NM logs error, but Client doesn't get signaled to kill the job -- Key: YARN-477 URL: https://issues.apache.org/jira/browse/YARN-477 Project: Hadoop YARN Issue Type: Bug Reporter: Eli Reisman Assignee: Zhijie Shen I have been porting Giraph to YARN (GIRAPH-13 is the issue) and when I launch my App Master, if the container command line runs it successfully, any failure in the App Master or my launched Giraph Tasks promptly reports to Client and ends my job run. However, if the command line sent to the app master container fails to launch it at all, the error exit code is not propagating. My client hangs with the job at containersUsed == 1 and state == ACCEPTED for as long as you want to sit and wait before CTRL-C'ing your way out. Disclaimer: this could be my fault. But I wanted to throw it out there in case its not. I also (when this happens) not getting error logs since the app master never launched, so I really have no visibility into why it failed to launch. I am sure its not launching, but the client IS sending the app request, getting a container for my AM, and I see the command line run on the container in my logs. Thats all. Thanks! If this is a dup or won't fix for some reason, let me know and sorry for wasting your time! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-509) $var shell substitution in properties are not expanded in hadoop-policy.xml
[ https://issues.apache.org/jira/browse/YARN-509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616837#comment-13616837 ] Hadoop QA commented on YARN-509: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575809/YARN-509.patch.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/618//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/618//console This message is automatically generated. $var shell substitution in properties are not expanded in hadoop-policy.xml --- Key: YARN-509 URL: https://issues.apache.org/jira/browse/YARN-509 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.1-alpha Environment: BigTop Kerberized cluster test environment Reporter: Konstantin Boudnik Priority: Blocker Fix For: 3.0.0, 2.0.4-alpha Attachments: YARN-509.patch.txt During BigTop 0.6.0 release test cycle, [~rvs] came around the following problem: {noformat} 013-03-26 15:37:03,573 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager org.apache.hadoop.yarn.YarnException: Failed to Start org.apache.hadoop.yarn.server.nodemanager.NodeManager at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:199) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:322) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:359) Caused by: org.apache.avro.AvroRuntimeException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:162) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) ... 3 more Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:128) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:61) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:199) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:158) ... 4 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User yarn/ip-10-46-37-244.ec2.internal@BIGTOP (auth:KERBEROS) is not authorized for protocol interface org.apache.hadoop.yarn.server.api.ResourceTrackerPB, expected client Kerberos principal is yarn/ip-10-46-37-244.ec2.internal@BIGTOP at org.apache.hadoop.ipc.Client.call(Client.java:1235) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at $Proxy26.registerNodeManager(Unknown Source) at org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59) ... 6 more {noformat} The most significant part is {{User yarn/ip-10-46-37-244.ec2.internal@BIGTOP (auth:KERBEROS) is not authorized for protocol interface org.apache.hadoop.yarn.server.api.ResourceTrackerPB}} indicating that
[jira] [Commented] (YARN-493) NodeManager job control logic flaws on Windows
[ https://issues.apache.org/jira/browse/YARN-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616846#comment-13616846 ] Hadoop QA commented on YARN-493: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575800/YARN-493.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/619//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/619//console This message is automatically generated. NodeManager job control logic flaws on Windows -- Key: YARN-493 URL: https://issues.apache.org/jira/browse/YARN-493 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.0.0 Attachments: YARN-493.1.patch, YARN-493.2.patch Both product and test code contain some platform-specific assumptions, such as availability of bash for executing a command in a container and signals to check existence of a process and terminate it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.
[ https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616856#comment-13616856 ] Hadoop QA commented on YARN-101: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575823/YARN-101.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/622//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/622//console This message is automatically generated. If the heartbeat message loss, the nodestatus info of complete container will loss too. Key: YARN-101 URL: https://issues.apache.org/jira/browse/YARN-101 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: suse. Reporter: xieguiming Assignee: Xuan Gong Priority: Minor Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch see the red color: org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java protected void startStatusUpdater() { new Thread(Node Status Updater) { @Override @SuppressWarnings(unchecked) public void run() { int lastHeartBeatID = 0; while (!isStopped) { // Send heartbeat try { synchronized (heartbeatMonitor) { heartbeatMonitor.wait(heartBeatInterval); } {color:red} // Before we send the heartbeat, we get the NodeStatus, // whose method removes completed containers. NodeStatus nodeStatus = getNodeStatus(); {color} nodeStatus.setResponseId(lastHeartBeatID); NodeHeartbeatRequest request = recordFactory .newRecordInstance(NodeHeartbeatRequest.class); request.setNodeStatus(nodeStatus); {color:red} // But if the nodeHeartbeat fails, we've already removed the containers away to know about it. We aren't handling a nodeHeartbeat failure case here. HeartbeatResponse response = resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); {color} if (response.getNodeAction() == NodeAction.SHUTDOWN) { LOG .info(Recieved SHUTDOWN signal from Resourcemanager as part of heartbeat, + hence shutting down.); NodeStatusUpdaterImpl.this.stop(); break; } if (response.getNodeAction() == NodeAction.REBOOT) { LOG.info(Node is out of sync with ResourceManager, + hence rebooting.); NodeStatusUpdaterImpl.this.reboot(); break; } lastHeartBeatID = response.getResponseId(); ListContainerId containersToCleanup = response .getContainersToCleanupList(); if (containersToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedContainersEvent(containersToCleanup)); } ListApplicationId appsToCleanup = response.getApplicationsToCleanupList(); //Only start tracking for keepAlive on FINISH_APP trackAppsForKeepAlive(appsToCleanup); if (appsToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedAppsEvent(appsToCleanup));
[jira] [Updated] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-309: --- Attachment: YARN-309.7.patch 1.Upload the new patch based on the lastest trunk version 2. fix the compile error Make RM provide heartbeat interval to NM Key: YARN-309 URL: https://issues.apache.org/jira/browse/YARN-309 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-309.1.patch, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.
[ https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-101: --- Attachment: YARN-101.4.patch Fix testcase failure If the heartbeat message loss, the nodestatus info of complete container will loss too. Key: YARN-101 URL: https://issues.apache.org/jira/browse/YARN-101 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: suse. Reporter: xieguiming Assignee: Xuan Gong Priority: Minor Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, YARN-101.4.patch see the red color: org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java protected void startStatusUpdater() { new Thread(Node Status Updater) { @Override @SuppressWarnings(unchecked) public void run() { int lastHeartBeatID = 0; while (!isStopped) { // Send heartbeat try { synchronized (heartbeatMonitor) { heartbeatMonitor.wait(heartBeatInterval); } {color:red} // Before we send the heartbeat, we get the NodeStatus, // whose method removes completed containers. NodeStatus nodeStatus = getNodeStatus(); {color} nodeStatus.setResponseId(lastHeartBeatID); NodeHeartbeatRequest request = recordFactory .newRecordInstance(NodeHeartbeatRequest.class); request.setNodeStatus(nodeStatus); {color:red} // But if the nodeHeartbeat fails, we've already removed the containers away to know about it. We aren't handling a nodeHeartbeat failure case here. HeartbeatResponse response = resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); {color} if (response.getNodeAction() == NodeAction.SHUTDOWN) { LOG .info(Recieved SHUTDOWN signal from Resourcemanager as part of heartbeat, + hence shutting down.); NodeStatusUpdaterImpl.this.stop(); break; } if (response.getNodeAction() == NodeAction.REBOOT) { LOG.info(Node is out of sync with ResourceManager, + hence rebooting.); NodeStatusUpdaterImpl.this.reboot(); break; } lastHeartBeatID = response.getResponseId(); ListContainerId containersToCleanup = response .getContainersToCleanupList(); if (containersToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedContainersEvent(containersToCleanup)); } ListApplicationId appsToCleanup = response.getApplicationsToCleanupList(); //Only start tracking for keepAlive on FINISH_APP trackAppsForKeepAlive(appsToCleanup); if (appsToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedAppsEvent(appsToCleanup)); } } catch (Throwable e) { // TODO Better error handling. Thread can die with the rest of the // NM still running. LOG.error(Caught exception in status-updater, e); } } } }.start(); } private NodeStatus getNodeStatus() { NodeStatus nodeStatus = recordFactory.newRecordInstance(NodeStatus.class); nodeStatus.setNodeId(this.nodeId); int numActiveContainers = 0; ListContainerStatus containersStatuses = new ArrayListContainerStatus(); for (IteratorEntryContainerId, Container i = this.context.getContainers().entrySet().iterator(); i.hasNext();) { EntryContainerId, Container e = i.next(); ContainerId containerId = e.getKey(); Container container = e.getValue(); // Clone the container to send it to the RM org.apache.hadoop.yarn.api.records.ContainerStatus containerStatus = container.cloneAndGetContainerStatus(); containersStatuses.add(containerStatus); ++numActiveContainers; LOG.info(Sending out status for container: + containerStatus); {color:red} // Here is the part that removes the completed containers. if (containerStatus.getState() == ContainerState.COMPLETE) { // Remove i.remove(); {color} LOG.info(Removed completed container + containerId); } } nodeStatus.setContainersStatuses(containersStatuses); LOG.debug(this.nodeId + sending out status
[jira] [Updated] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-193: - Attachment: YARN-193.9.patch Merge agains the latest trunk, and replace newly introduced * with ResourceRequest.ANY, as YARN-450 has been committed. Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality
[ https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616917#comment-13616917 ] Sandy Ryza commented on YARN-392: - Uploading a patch based on the earlier discussion here and on YARN-398. The patch adds a boolean flag to each resource request which essentially means don't schedule using this resource request or any above it and adds support for it to the fair scheduler. I call the flag noAllocateAt, but we could definitely use a better name if anybody has suggestions. I didn't use blacklist because it already has a meaning in the context of mapreduce, and to me seems to imply that a blacklisted rack would not allow any containers to be scheduled anywhere on it, when the meaning is a little different. Make it possible to schedule to specific nodes without dropping locality Key: YARN-392 URL: https://issues.apache.org/jira/browse/YARN-392 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Sandy Ryza Attachments: YARN-392-1.patch, YARN-392.patch Currently its not possible to specify scheduling requests for specific nodes and nowhere else. The RM automatically relaxes locality to rack and * and assigns non-specified machines to the app. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-392) Make it possible to schedule to specific nodes without dropping locality
[ https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-392: Attachment: YARN-392-1.patch Make it possible to schedule to specific nodes without dropping locality Key: YARN-392 URL: https://issues.apache.org/jira/browse/YARN-392 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Sandy Ryza Attachments: YARN-392-1.patch, YARN-392.patch Currently its not possible to specify scheduling requests for specific nodes and nowhere else. The RM automatically relaxes locality to rack and * and assigns non-specified machines to the app. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-467: --- Attachment: yarn-467-20130328.patch Incorporating the comments. Jobs fail during resource localization when public distributed-cache hits unix directory limits --- Key: YARN-467 URL: https://issues.apache.org/jira/browse/YARN-467 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, yarn-467-20130322.3.patch, yarn-467-20130322.patch, yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache (PUBLIC). The jobs start failing with the below exception. java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) we need to have a mechanism where in we can create directory hierarchy and limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-475) Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment
[ https://issues.apache.org/jira/browse/YARN-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616969#comment-13616969 ] Hadoop QA commented on YARN-475: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575966/YARN-475.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/623//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/623//console This message is automatically generated. Remove ApplicationConstants.AM_APP_ATTEMPT_ID_ENV as it is no longer set in an AM's environment --- Key: YARN-475 URL: https://issues.apache.org/jira/browse/YARN-475 Project: Hadoop YARN Issue Type: Sub-task Reporter: Hitesh Shah Assignee: Hitesh Shah Attachments: YARN-475.1.patch AMs are expected to use ApplicationConstants.AM_CONTAINER_ID_ENV and derive the application attempt id from the container id. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits
[ https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616980#comment-13616980 ] Hadoop QA commented on YARN-193: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575991/YARN-193.9.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/625//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/625//console This message is automatically generated. Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits - Key: YARN-193 URL: https://issues.apache.org/jira/browse/YARN-193 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Hitesh Shah Assignee: Zhijie Shen Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, MR-3796.wip.patch, YARN-193.4.patch, YARN-193.5.patch, YARN-193.6.patch, YARN-193.7.patch, YARN-193.8.patch, YARN-193.9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-101) If the heartbeat message loss, the nodestatus info of complete container will loss too.
[ https://issues.apache.org/jira/browse/YARN-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616982#comment-13616982 ] Hadoop QA commented on YARN-101: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575989/YARN-101.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/626//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/626//console This message is automatically generated. If the heartbeat message loss, the nodestatus info of complete container will loss too. Key: YARN-101 URL: https://issues.apache.org/jira/browse/YARN-101 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: suse. Reporter: xieguiming Assignee: Xuan Gong Priority: Minor Attachments: YARN-101.1.patch, YARN-101.2.patch, YARN-101.3.patch, YARN-101.4.patch see the red color: org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.java protected void startStatusUpdater() { new Thread(Node Status Updater) { @Override @SuppressWarnings(unchecked) public void run() { int lastHeartBeatID = 0; while (!isStopped) { // Send heartbeat try { synchronized (heartbeatMonitor) { heartbeatMonitor.wait(heartBeatInterval); } {color:red} // Before we send the heartbeat, we get the NodeStatus, // whose method removes completed containers. NodeStatus nodeStatus = getNodeStatus(); {color} nodeStatus.setResponseId(lastHeartBeatID); NodeHeartbeatRequest request = recordFactory .newRecordInstance(NodeHeartbeatRequest.class); request.setNodeStatus(nodeStatus); {color:red} // But if the nodeHeartbeat fails, we've already removed the containers away to know about it. We aren't handling a nodeHeartbeat failure case here. HeartbeatResponse response = resourceTracker.nodeHeartbeat(request).getHeartbeatResponse(); {color} if (response.getNodeAction() == NodeAction.SHUTDOWN) { LOG .info(Recieved SHUTDOWN signal from Resourcemanager as part of heartbeat, + hence shutting down.); NodeStatusUpdaterImpl.this.stop(); break; } if (response.getNodeAction() == NodeAction.REBOOT) { LOG.info(Node is out of sync with ResourceManager, + hence rebooting.); NodeStatusUpdaterImpl.this.reboot(); break; } lastHeartBeatID = response.getResponseId(); ListContainerId containersToCleanup = response .getContainersToCleanupList(); if (containersToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedContainersEvent(containersToCleanup)); } ListApplicationId appsToCleanup = response.getApplicationsToCleanupList(); //Only start tracking for keepAlive on FINISH_APP trackAppsForKeepAlive(appsToCleanup); if (appsToCleanup.size() != 0) { dispatcher.getEventHandler().handle( new CMgrCompletedAppsEvent(appsToCleanup)); } } catch (Throwable e) {
[jira] [Commented] (YARN-309) Make RM provide heartbeat interval to NM
[ https://issues.apache.org/jira/browse/YARN-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616989#comment-13616989 ] Hadoop QA commented on YARN-309: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575985/YARN-309.7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/624//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/624//console This message is automatically generated. Make RM provide heartbeat interval to NM Key: YARN-309 URL: https://issues.apache.org/jira/browse/YARN-309 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-309.1.patch, YARN-309.2.patch, YARN-309.3.patch, YARN-309.4.patch, YARN-309.5.patch, YARN-309.6.patch, YARN-309.7.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-467) Jobs fail during resource localization when public distributed-cache hits unix directory limits
[ https://issues.apache.org/jira/browse/YARN-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616995#comment-13616995 ] Hadoop QA commented on YARN-467: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576003/yarn-467-20130328.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/627//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/627//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/627//console This message is automatically generated. Jobs fail during resource localization when public distributed-cache hits unix directory limits --- Key: YARN-467 URL: https://issues.apache.org/jira/browse/YARN-467 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.0.0-alpha Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Attachments: yarn-467-20130322.1.patch, yarn-467-20130322.2.patch, yarn-467-20130322.3.patch, yarn-467-20130322.patch, yarn-467-20130325.1.patch, yarn-467-20130325.path, yarn-467-20130328.patch If we have multiple jobs which uses distributed cache with small size of files, the directory limit reaches before reaching the cache size and fails to create any directories in file cache (PUBLIC). The jobs start failing with the below exception. java.io.IOException: mkdir of /tmp/nm-local-dir/filecache/3901886847734194975 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:909) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:147) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) we need to have a mechanism where in we can create directory hierarchy and limit number of files per directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617008#comment-13617008 ] nemon lou commented on YARN-276: [~zjshen] Yes,a dynamic maxActiveApplications will work ,too.And no need adding any new criteria .I'll give it a try . Thanks. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-493) NodeManager job control logic flaws on Windows
[ https://issues.apache.org/jira/browse/YARN-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617063#comment-13617063 ] Chris Nauroth commented on YARN-493: The test failure is unrelated. I suspect it was introduced in the patch for HADOOP-9357. I've added comments on that issue to discuss. NodeManager job control logic flaws on Windows -- Key: YARN-493 URL: https://issues.apache.org/jira/browse/YARN-493 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 3.0.0 Attachments: YARN-493.1.patch, YARN-493.2.patch Both product and test code contain some platform-specific assumptions, such as availability of bash for executing a command in a container and signals to check existence of a process and terminate it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reassigned YARN-276: Assignee: nemon lou Assigning this to [~nemon]. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Assignee: nemon lou Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-516) TestContainerLocalizer.testContainerLocalizerMain is failing
Vinod Kumar Vavilapalli created YARN-516: Summary: TestContainerLocalizer.testContainerLocalizerMain is failing Key: YARN-516 URL: https://issues.apache.org/jira/browse/YARN-516 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-516) TestContainerLocalizer.testContainerLocalizerMain is failing
[ https://issues.apache.org/jira/browse/YARN-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617102#comment-13617102 ] Vinod Kumar Vavilapalli commented on YARN-516: -- It is failing with the following: {code}Argument(s) are different! Wanted: localFs.mkdir( file:/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer/0/usercache/yak/filecache, isA(org.apache.hadoop.fs.permission.FsPermission), false ); - at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer.testContainerLocalizerMain(TestContainerLocalizer.java:139) Actual invocation has different arguments: localFs.mkdir( /home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer/0/usercache/yak/filecache, rwxr-xr-x, false ); - at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer.testContainerLocalizerMain(TestContainerLocalizer.java:132) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestContainerLocalizer.testContainerLocalizerMain(TestContainerLocalizer.java:139) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) {code} TestContainerLocalizer.testContainerLocalizerMain is failing Key: YARN-516 URL: https://issues.apache.org/jira/browse/YARN-516 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators