[jira] [Created] (YARN-2129) Add scheduling priority to the WindowsSecureContainerExecutor
Remus Rusanu created YARN-2129: -- Summary: Add scheduling priority to the WindowsSecureContainerExecutor Key: YARN-2129 URL: https://issues.apache.org/jira/browse/YARN-2129 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 3.0.0 Reporter: Remus Rusanu Assignee: Remus Rusanu The WCE (YARN-1972) could and should honor NM_CONTAINER_EXECUTOR_SCHED_PRIORITY. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2030) Use StateMachine to simplify handleStoreEvent() in RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019588#comment-14019588 ] Jian He commented on YARN-2030: --- I meant add an abstract getProto method. > Use StateMachine to simplify handleStoreEvent() in RMStateStore > --- > > Key: YARN-2030 > URL: https://issues.apache.org/jira/browse/YARN-2030 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Junping Du >Assignee: Binglin Chang > Attachments: YARN-2030.v1.patch, YARN-2030.v2.patch, > YARN-2030.v3.patch > > > Now the logic to handle different store events in handleStoreEvent() is as > following: > {code} > if (event.getType().equals(RMStateStoreEventType.STORE_APP) > || event.getType().equals(RMStateStoreEventType.UPDATE_APP)) { > ... > if (event.getType().equals(RMStateStoreEventType.STORE_APP)) { > ... > } else { > ... > } > ... > try { > if (event.getType().equals(RMStateStoreEventType.STORE_APP)) { > ... > } else { > ... > } > } > ... > } else if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT) > || event.getType().equals(RMStateStoreEventType.UPDATE_APP_ATTEMPT)) { > ... > if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) { > ... > } else { > ... > } > ... > if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) { > ... > } else { > ... > } > } > ... > } else if (event.getType().equals(RMStateStoreEventType.REMOVE_APP)) { > ... > } else { > ... > } > } > {code} > This is not only confuse people but also led to mistake easily. We may > leverage state machine to simply this even no state transitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2030) Use StateMachine to simplify handleStoreEvent() in RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019581#comment-14019581 ] Jian He commented on YARN-2030: --- I see, thanks for the update. Maybe just promote getProto() to the abstract class, given this record is used internally by RM only? should be fine? > Use StateMachine to simplify handleStoreEvent() in RMStateStore > --- > > Key: YARN-2030 > URL: https://issues.apache.org/jira/browse/YARN-2030 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Junping Du >Assignee: Binglin Chang > Attachments: YARN-2030.v1.patch, YARN-2030.v2.patch, > YARN-2030.v3.patch > > > Now the logic to handle different store events in handleStoreEvent() is as > following: > {code} > if (event.getType().equals(RMStateStoreEventType.STORE_APP) > || event.getType().equals(RMStateStoreEventType.UPDATE_APP)) { > ... > if (event.getType().equals(RMStateStoreEventType.STORE_APP)) { > ... > } else { > ... > } > ... > try { > if (event.getType().equals(RMStateStoreEventType.STORE_APP)) { > ... > } else { > ... > } > } > ... > } else if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT) > || event.getType().equals(RMStateStoreEventType.UPDATE_APP_ATTEMPT)) { > ... > if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) { > ... > } else { > ... > } > ... > if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) { > ... > } else { > ... > } > } > ... > } else if (event.getType().equals(RMStateStoreEventType.REMOVE_APP)) { > ... > } else { > ... > } > } > {code} > This is not only confuse people but also led to mistake easily. We may > leverage state machine to simply this even no state transitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart
[ https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019574#comment-14019574 ] Jian He commented on YARN-1365: --- bq. The option is see is we pass in a flag to AppAttemptAddedSchedulerEvent that tells scheduler not to issue ATTEMPT_ADDED. Makes sense. Anubhav, do you want to comment on YARN-1368 so that I can fix it or you want to include the fix here? > ApplicationMasterService to allow Register and Unregister of an app that was > running before restart > --- > > Key: YARN-1365 > URL: https://issues.apache.org/jira/browse/YARN-1365 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Anubhav Dhoot > Attachments: YARN-1365.001.patch, YARN-1365.002.patch, > YARN-1365.003.patch, YARN-1365.initial.patch > > > For an application that was running before restart, the > ApplicationMasterService currently throws an exception when the app tries to > make the initial register or final unregister call. These should succeed and > the RMApp state machine should transition to completed like normal. > Unregistration should succeed for an app that the RM considers complete since > the RM may have died after saving completion in the store but before > notifying the AM that the AM is free to exit. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2122) In AllocationFileLoaderService, the reloadThread should be created in init() and started in start()
[ https://issues.apache.org/jira/browse/YARN-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019565#comment-14019565 ] Karthik Kambatla commented on YARN-2122: {code} public void serviceInit(Configuration conf) { this.allocFile = getAllocationFile(conf); super.init(conf); reloadThread = new Thread() { public void run() { {code} - We should call super.serviceInit instead of super.init, and that call should be the last statement of the method. - Creation of reloadThread should be guarded by if (allocFile != null) {code} public void serviceStart() { if (allocFile == null) { return; } reloadThread.start(); super.start(); } {code} - It is good practice to call super.serviceStart() at the end of this method (not super.start()). So, we should probably not check for (allocFile == null) and instead call reloadThread.start() after checking (allocFile != null). - super.serviceStop instead of super.stop - For the findbugs warning, I see why we don't have to do any additional synchronization. Can we add a findbugs exclusion? > In AllocationFileLoaderService, the reloadThread should be created in init() > and started in start() > --- > > Key: YARN-2122 > URL: https://issues.apache.org/jira/browse/YARN-2122 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla >Assignee: Robert Kanter > Attachments: YARN-2122.patch, YARN-2122.patch > > > AllcoationFileLoaderService has this reloadThread that is currently created > and started in start(). Instead, it should be created in init() and started > in start(). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
[ https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019561#comment-14019561 ] Karthik Kambatla commented on YARN-1514: The patch looks like a good first-cut. Could you add the configuration options as you mentioned? > Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA > > > Key: YARN-1514 > URL: https://issues.apache.org/jira/browse/YARN-1514 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Fix For: 2.5.0 > > Attachments: YARN-1514.wip.patch > > > ZKRMStateStore is very sensitive to ZNode-related operations as discussed in > YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is > called when RM-HA cluster does failover. Therefore, its execution time > impacts failover time of RM-HA. > We need utility to benchmark time execution time of ZKRMStateStore#loadStore > as development tool. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1874) Cleanup: Move RMActiveServices out of ResourceManager into its own file
[ https://issues.apache.org/jira/browse/YARN-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019558#comment-14019558 ] Karthik Kambatla commented on YARN-1874: Barely skimmed through the patch, changes look reasonable. However, it would be easier if we could split this into smaller patches. At the least, RMContext related parts could be done in a separate JIRA first. > Cleanup: Move RMActiveServices out of ResourceManager into its own file > --- > > Key: YARN-1874 > URL: https://issues.apache.org/jira/browse/YARN-1874 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Karthik Kambatla >Assignee: Tsuyoshi OZAWA > Attachments: YARN-1874.1.patch, YARN-1874.2.patch, YARN-1874.3.patch, > YARN-1874.4.patch > > > As [~vinodkv] noticed on YARN-1867, ResourceManager is hard to maintain. We > should move RMActiveServices out to make it more manageable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1424) RMAppAttemptImpl should precompute a zeroed ApplicationResourceUsageReport to return when attempt not active
[ https://issues.apache.org/jira/browse/YARN-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019541#comment-14019541 ] Karthik Kambatla commented on YARN-1424: In my opinion, DUMMY_APPLICATION_RESOURCE_USAGE_REPORT should have zeroes for all resource-related values as opposed to -1. In case, people fetch these reports and try to accumulate statistics, -1 would through their math off. The app_id itself has -1 in it, so I don't see a risk of it being misconstrued. > RMAppAttemptImpl should precompute a zeroed ApplicationResourceUsageReport to > return when attempt not active > > > Key: YARN-1424 > URL: https://issues.apache.org/jira/browse/YARN-1424 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Sandy Ryza >Assignee: Ray Chiang >Priority: Minor > Labels: newbie > Attachments: YARN1424-01.patch > > > RMAppImpl has a DUMMY_APPLICATION_RESOURCE_USAGE_REPORT to return when the > caller of createAndGetApplicationReport doesn't have access. > RMAppAttemptImpl should have something similar for > getApplicationResourceUsageReport. > It also might make sense to put the dummy report into > ApplicationResourceUsageReport and allow both to use it. > A test would also be useful to verify that > RMAppAttemptImpl#getApplicationResourceUsageReport doesn't return null if the > scheduler doesn't have a report to return. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures
[ https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019510#comment-14019510 ] Wangda Tan commented on YARN-2074: -- Can we populate "attemptFailureCount" to AM when AM registering? This should be a valuable fix, other applications besides MR can benefit from this too. I think it's better to create a separated JIRA to track this. [~jianhe], [~mayank_bansal] Any thoughts? > Preemption of AM containers shouldn't count towards AM failures > --- > > Key: YARN-2074 > URL: https://issues.apache.org/jira/browse/YARN-2074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Vinod Kumar Vavilapalli >Assignee: Jian He > Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch > > > One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM > containers getting preempted shouldn't count towards AM failures and thus > shouldn't eventually fail applications. > We should explicitly handle AM container preemption/kill as a separate issue > and not count it towards the limit on AM failures. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2120) Coloring queues running over minShare on RM Scheduler page
[ https://issues.apache.org/jira/browse/YARN-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019494#comment-14019494 ] Siqi Li commented on YARN-2120: --- [~ashwinshankar77] I have attached a new screenshot, which retains the original color format for fairShare. Additionally, using blue and red border color to differentiate if a queue exceeds its minshare or maxshare. In case of not setting minShare, the blue border will not be displayed. I will run some tests and upload the patch shortly > Coloring queues running over minShare on RM Scheduler page > -- > > Key: YARN-2120 > URL: https://issues.apache.org/jira/browse/YARN-2120 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: 1595210E-C2D6-42D3-8C1D-73DFF1614CF9.png, > YARN-2120.v1.patch > > > Today RM Scheduler page shows FairShare, Used, Used (over fair share) and > MaxCapacity. > Since fairShare is displaying with dotted line, I think we can stop > displaying orange when a queue over its fairshare. > It would be better to show a queue running over minShare with orange color, > so that we know queue is running more than its min share. > Also, we can display a queue running at maxShare with red color. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2120) Coloring queues running over minShare on RM Scheduler page
[ https://issues.apache.org/jira/browse/YARN-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019488#comment-14019488 ] Hadoop QA commented on YARN-2120: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648584/1595210E-C2D6-42D3-8C1D-73DFF1614CF9.png against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3918//console This message is automatically generated. > Coloring queues running over minShare on RM Scheduler page > -- > > Key: YARN-2120 > URL: https://issues.apache.org/jira/browse/YARN-2120 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: 1595210E-C2D6-42D3-8C1D-73DFF1614CF9.png, > YARN-2120.v1.patch > > > Today RM Scheduler page shows FairShare, Used, Used (over fair share) and > MaxCapacity. > Since fairShare is displaying with dotted line, I think we can stop > displaying orange when a queue over its fairshare. > It would be better to show a queue running over minShare with orange color, > so that we know queue is running more than its min share. > Also, we can display a queue running at maxShare with red color. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2128) SchedulerApplicationAttempt's amResource should be normalized instead of fetching from ApplicationSubmissionContext directly
[ https://issues.apache.org/jira/browse/YARN-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019486#comment-14019486 ] Hadoop QA commented on YARN-2128: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648575/YARN-2128.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3917//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3917//console This message is automatically generated. > SchedulerApplicationAttempt's amResource should be normalized instead of > fetching from ApplicationSubmissionContext directly > > > Key: YARN-2128 > URL: https://issues.apache.org/jira/browse/YARN-2128 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2128.patch > > > The amResource should be normalized. > {code} > ApplicationSubmissionContext appSubmissionContext = > rmContext.getRMApps().get(applicationAttemptId.getApplicationId()) > .getApplicationSubmissionContext(); > if (appSubmissionContext != null) { > amResource = appSubmissionContext.getResource(); > unmanagedAM = appSubmissionContext.getUnmanagedAM(); > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2120) Coloring queues running over minShare on RM Scheduler page
[ https://issues.apache.org/jira/browse/YARN-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2120: -- Attachment: (was: AD45B623-9F14-420B-B1FB-1186E2B5EC4A.png) > Coloring queues running over minShare on RM Scheduler page > -- > > Key: YARN-2120 > URL: https://issues.apache.org/jira/browse/YARN-2120 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: 1595210E-C2D6-42D3-8C1D-73DFF1614CF9.png, > YARN-2120.v1.patch > > > Today RM Scheduler page shows FairShare, Used, Used (over fair share) and > MaxCapacity. > Since fairShare is displaying with dotted line, I think we can stop > displaying orange when a queue over its fairshare. > It would be better to show a queue running over minShare with orange color, > so that we know queue is running more than its min share. > Also, we can display a queue running at maxShare with red color. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2120) Coloring queues running over minShare on RM Scheduler page
[ https://issues.apache.org/jira/browse/YARN-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2120: -- Attachment: 1595210E-C2D6-42D3-8C1D-73DFF1614CF9.png > Coloring queues running over minShare on RM Scheduler page > -- > > Key: YARN-2120 > URL: https://issues.apache.org/jira/browse/YARN-2120 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: 1595210E-C2D6-42D3-8C1D-73DFF1614CF9.png, > YARN-2120.v1.patch > > > Today RM Scheduler page shows FairShare, Used, Used (over fair share) and > MaxCapacity. > Since fairShare is displaying with dotted line, I think we can stop > displaying orange when a queue over its fairshare. > It would be better to show a queue running over minShare with orange color, > so that we know queue is running more than its min share. > Also, we can display a queue running at maxShare with red color. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2128) SchedulerApplicationAttempt's amResource should be normalized instead of fetching from ApplicationSubmissionContext directly
[ https://issues.apache.org/jira/browse/YARN-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2128: -- Attachment: YARN-2128.patch > SchedulerApplicationAttempt's amResource should be normalized instead of > fetching from ApplicationSubmissionContext directly > > > Key: YARN-2128 > URL: https://issues.apache.org/jira/browse/YARN-2128 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2128.patch > > > The amResource should be normalized. > {code} > ApplicationSubmissionContext appSubmissionContext = > rmContext.getRMApps().get(applicationAttemptId.getApplicationId()) > .getApplicationSubmissionContext(); > if (appSubmissionContext != null) { > amResource = appSubmissionContext.getResource(); > unmanagedAM = appSubmissionContext.getUnmanagedAM(); > } > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2128) SchedulerApplicationAttempt's amResource should be normalized instead of fetching from ApplicationSubmissionContext directly
Wei Yan created YARN-2128: - Summary: SchedulerApplicationAttempt's amResource should be normalized instead of fetching from ApplicationSubmissionContext directly Key: YARN-2128 URL: https://issues.apache.org/jira/browse/YARN-2128 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan The amResource should be normalized. {code} ApplicationSubmissionContext appSubmissionContext = rmContext.getRMApps().get(applicationAttemptId.getApplicationId()) .getApplicationSubmissionContext(); if (appSubmissionContext != null) { amResource = appSubmissionContext.getResource(); unmanagedAM = appSubmissionContext.getUnmanagedAM(); } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2126) The FSLeafQueue.amResourceUsage shouldn't be updated when an Application removed before it runs AM
[ https://issues.apache.org/jira/browse/YARN-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2126: -- Description: When an application is removed, the FSLeafQueue updates its amResourceUsage. {code} if (runnableAppScheds.remove(app.getAppSchedulable())) { // Update AM resource usage if (app.getAMResource() != null) { Resources.subtractFrom(amResourceUsage, app.getAMResource()); } return true; } {code} If an application is removed before it has a chance to start its AM, the amResourceUsage shouldn't be updated. was: When an application is removed, the FSLeafQueue updates its amResourceUsage. {code} if (runnableAppScheds.remove(app.getAppSchedulable())) { // Update AM resource usage if (app.getAMResource() != null) { Resources.subtractFrom(amResourceUsage, app.getAMResource()); } return true; } {code} If an application is removed before it has a change to start its AM, the amResourceUsage shouldn't be updated. > The FSLeafQueue.amResourceUsage shouldn't be updated when an Application > removed before it runs AM > -- > > Key: YARN-2126 > URL: https://issues.apache.org/jira/browse/YARN-2126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Fix For: 2.5.0 > > > When an application is removed, the FSLeafQueue updates its amResourceUsage. > {code} > if (runnableAppScheds.remove(app.getAppSchedulable())) { > // Update AM resource usage > if (app.getAMResource() != null) { > Resources.subtractFrom(amResourceUsage, app.getAMResource()); > } > return true; > } > {code} > If an application is removed before it has a chance to start its AM, the > amResourceUsage shouldn't be updated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2127) Move YarnUncaughtExceptionHandler into Hadoop common
[ https://issues.apache.org/jira/browse/YARN-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019165#comment-14019165 ] Steve Loughran commented on YARN-2127: -- I can incorporate this into YARN-679 easily enough -I just wanted to flag it as one of the actions I'd like to do. The YARN-769 service launcher does not depend on it -but its throwable catching logic would be flawed without it > Move YarnUncaughtExceptionHandler into Hadoop common > > > Key: YARN-2127 > URL: https://issues.apache.org/jira/browse/YARN-2127 > Project: Hadoop YARN > Issue Type: Improvement > Components: api >Affects Versions: 2.4.0 >Reporter: Steve Loughran >Priority: Minor > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > Create a superclass of {{YarnUncaughtExceptionHandler}} in the hadoop-common > code (retaining the original for compatibility). > This would be available for any hadoop application to use, and the YARN-679 > launcher could automatically set up the handler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2127) Move YarnUncaughtExceptionHandler into Hadoop common
Steve Loughran created YARN-2127: Summary: Move YarnUncaughtExceptionHandler into Hadoop common Key: YARN-2127 URL: https://issues.apache.org/jira/browse/YARN-2127 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.4.0 Reporter: Steve Loughran Priority: Minor Create a superclass of {{YarnUncaughtExceptionHandler}} in the hadoop-common code (retaining the original for compatibility). This would be available for any hadoop application to use, and the YARN-679 launcher could automatically set up the handler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2030) Use StateMachine to simplify handleStoreEvent() in RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated YARN-2030: Attachment: YARN-2030.v3.patch Thanks for the comments [~djp] and [~jianhe]. I update the patch to make ApplicationAttempStateData and ApplicationStateData abstract classes. bq. Accordingly storeApplicationStateInternal can take in ApplicationStateData instead of ApplicationStateDataPBImpl as the argument to avoid the type cast. I try to change updateApplicationAttemptStateInternal paramter type from PBImpl to abstract records, but looks like some RMStateStore(FileSystemRMStateStore and ZKRMStateStore) require the parameter to be PBImpl(so they can use toProto to serialize) > Use StateMachine to simplify handleStoreEvent() in RMStateStore > --- > > Key: YARN-2030 > URL: https://issues.apache.org/jira/browse/YARN-2030 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Junping Du >Assignee: Binglin Chang > Attachments: YARN-2030.v1.patch, YARN-2030.v2.patch, > YARN-2030.v3.patch > > > Now the logic to handle different store events in handleStoreEvent() is as > following: > {code} > if (event.getType().equals(RMStateStoreEventType.STORE_APP) > || event.getType().equals(RMStateStoreEventType.UPDATE_APP)) { > ... > if (event.getType().equals(RMStateStoreEventType.STORE_APP)) { > ... > } else { > ... > } > ... > try { > if (event.getType().equals(RMStateStoreEventType.STORE_APP)) { > ... > } else { > ... > } > } > ... > } else if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT) > || event.getType().equals(RMStateStoreEventType.UPDATE_APP_ATTEMPT)) { > ... > if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) { > ... > } else { > ... > } > ... > if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) { > ... > } else { > ... > } > } > ... > } else if (event.getType().equals(RMStateStoreEventType.REMOVE_APP)) { > ... > } else { > ... > } > } > {code} > This is not only confuse people but also led to mistake easily. We may > leverage state machine to simply this even no state transitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2126) The FSLeafQueue.amResourceUsage shouldn't be updated when an Application removed before it runs AM
[ https://issues.apache.org/jira/browse/YARN-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2126: -- Fix Version/s: 2.5.0 > The FSLeafQueue.amResourceUsage shouldn't be updated when an Application > removed before it runs AM > -- > > Key: YARN-2126 > URL: https://issues.apache.org/jira/browse/YARN-2126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Fix For: 2.5.0 > > > When an application is removed, the FSLeafQueue updates its amResourceUsage. > {code} > if (runnableAppScheds.remove(app.getAppSchedulable())) { > // Update AM resource usage > if (app.getAMResource() != null) { > Resources.subtractFrom(amResourceUsage, app.getAMResource()); > } > return true; > } > {code} > If an application is removed before it has a change to start its AM, the > amResourceUsage shouldn't be updated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1977) Add tests on getApplicationRequest with filtering start time range
[ https://issues.apache.org/jira/browse/YARN-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018901#comment-14018901 ] Hudson commented on YARN-1977: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5651 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5651/]) YARN-1977. Add tests on getApplicationRequest with filtering start time range. (Contributed by Junping Du) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1600644) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java > Add tests on getApplicationRequest with filtering start time range > -- > > Key: YARN-1977 > URL: https://issues.apache.org/jira/browse/YARN-1977 > Project: Hadoop YARN > Issue Type: Test >Reporter: Junping Du >Assignee: Junping Du >Priority: Minor > Fix For: 2.5.0 > > Attachments: YARN-1977.patch > > > There is no unit test to verify if request with start time range works to get > right application list, we should add it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2126) The FSLeafQueue.amResourceUsage shouldn't be updated when an Application removed before it runs AM
Wei Yan created YARN-2126: - Summary: The FSLeafQueue.amResourceUsage shouldn't be updated when an Application removed before it runs AM Key: YARN-2126 URL: https://issues.apache.org/jira/browse/YARN-2126 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan When an application is removed, the FSLeafQueue updates its amResourceUsage. {code} if (runnableAppScheds.remove(app.getAppSchedulable())) { // Update AM resource usage if (app.getAMResource() != null) { Resources.subtractFrom(amResourceUsage, app.getAMResource()); } return true; } {code} If an application is removed before it has a change to start its AM, the amResourceUsage shouldn't be updated. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2061) Revisit logging levels in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018858#comment-14018858 ] Hudson commented on YARN-2061: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1792 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1792/]) YARN-2061. Revisit logging levels in ZKRMStateStore. (Ray Chiang via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1600498) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java > Revisit logging levels in ZKRMStateStore > - > > Key: YARN-2061 > URL: https://issues.apache.org/jira/browse/YARN-2061 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla >Assignee: Ray Chiang >Priority: Minor > Labels: newbie > Fix For: 2.5.0 > > Attachments: YARN2061-01.patch > > > ZKRMStateStore has a few places where it is logging at the INFO level. We > should change these to DEBUG or TRACE level messages. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2119) DEFAULT_PROXY_ADDRESS should use DEFAULT_PROXY_PORT
[ https://issues.apache.org/jira/browse/YARN-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018857#comment-14018857 ] Hudson commented on YARN-2119: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1792 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1792/]) YARN-2119. DEFAULT_PROXY_ADDRESS should use DEFAULT_PROXY_PORT. (Anubhav Dhoot via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1600484) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServer.java > DEFAULT_PROXY_ADDRESS should use DEFAULT_PROXY_PORT > --- > > Key: YARN-2119 > URL: https://issues.apache.org/jira/browse/YARN-2119 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.5.0 > > Attachments: YARN-2119.patch > > > The fix for [YARN-1590|https://issues.apache.org/jira/browse/YARN-1590] > introduced an method to get web proxy bind address with the incorrect default > port. Because all the users of the method (only 1 user) ignores the port, its > not breaking anything yet. Fixing it in case someone else uses this in the > future. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-1972: --- Description: h1. Windows Secure Container Executor (WCE) YARN-1063 adds the necessary infrasturcture to launch a process as a domain user as a solution for the problem of having a security boundary between processes executed in YARN containers and the Hadoop services. The WCE is a container executor that leverages the winutils capabilities introduced in YARN-1063 and launches containers as an OS process running as the job submitter user. A description of the S4U infrastructure used by YARN-1063 alternatives considered can be read on that JIRA. The WCE is based on the DefaultContainerExecutor. It relies on the DCE to drive the flow of execution, but it overwrrides some emthods to the effect of: * change the DCE created user cache directories to be owned by the job user and by the nodemanager group. * changes the actual container run command to use the 'createAsUser' command of winutils task instead of 'create' * runs the localization as standalone process instead of an in-process Java method call. This in turn relies on the winutil createAsUser feature to run the localization as the job user. When compared to LinuxContainerExecutor (LCE), the WCE has some minor differences: * it does no delegate the creation of the user cache directories to the native implementation. * it does no require special handling to be able to delete user files The approach on the WCE came from a practical trial-and-error approach. I had to iron out some issues around the Windows script shell limitations (command line length) to get it to work, the biggest issue being the huge CLASSPATH that is commonplace in Hadoop environment container executions. The job container itself is already dealing with this via a so called 'classpath jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch as a separate container the same issue had to be resolved and I used the same 'classpath jar' approach. h2. Deployment Requirements To use the WCE one needs to set the `yarn.nodemanager.container-executor.class` to `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` and set the `yarn.nodemanager.windows-secure-container-executor.group` to a Windows security group name that is the nodemanager service principal is a member of (equivalent of LCE `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE does not require any configuration outside of the Hadoop own's yar-site.xml. For WCE to work the nodemanager must run as a service principal that is member of the local Administrators group or LocalSystem. this is derived from the need to invoke LoadUserProfile API which mention these requirements in the specifications. This is in addition to the SE_TCB privilege mentioned in YARN-1063, but this requirement will automatically imply that the SE_TCB privilege is held by the nodemanager. For the Linux speakers in the audience, the requirement is basically to run NM as root. h2. Dedicated high privilege Service Due to the high privilege required by the WCE we had discussed the need to isolate the high privilege operations into a separate process, an 'executor' service that is solely responsible to start the containers (incloding the localizer). The NM would have to authenticate, authorize and communicate with this service via an IPC mechanism and use this service to launch the containers. I still believe we'll end up deploying such a service, but the effort to onboard such a new platfrom specific new service on the project are not trivial. was: Windows Secure Container Executor (WCE) YARN-1063 adds the necessary infrasturcture to launch a process as a domain user as a solution for the problem of having a security boundary between processes executed in YARN containers and the Hadoop services. The WCE is a container executor that leverages the winutils capabilities introduced in YARN-1063 and launches containers as an OS process running as the job submitter user. A description of the S4U infrastructure used by YARN-1063 alternatives considered can be read on that JIRA. The WCE is based on the DefaultContainerExecutor. It relies on the DCE to drive the flow of execution, but it overwrrides some emthods to the effect of: - change the DCE created user cache directories to be owned by the job user and by the nodemanager group. - changes the actual container run command to use the 'createAsUser' command of winutils task instead of 'create' - runs the localization as standalone process instead of an in-process Java method call. This in turn relies on the winutil createAsUser feature to run the localization as the job user. When compared to LinuxContainerExecutor (LCE), the WCE has some minor differences: - it doe
[jira] [Updated] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-1972: --- Description: Windows Secure Container Executor (WCE) YARN-1063 adds the necessary infrasturcture to launch a process as a domain user as a solution for the problem of having a security boundary between processes executed in YARN containers and the Hadoop services. The WCE is a container executor that leverages the winutils capabilities introduced in YARN-1063 and launches containers as an OS process running as the job submitter user. A description of the S4U infrastructure used by YARN-1063 alternatives considered can be read on that JIRA. The WCE is based on the DefaultContainerExecutor. It relies on the DCE to drive the flow of execution, but it overwrrides some emthods to the effect of: - change the DCE created user cache directories to be owned by the job user and by the nodemanager group. - changes the actual container run command to use the 'createAsUser' command of winutils task instead of 'create' - runs the localization as standalone process instead of an in-process Java method call. This in turn relies on the winutil createAsUser feature to run the localization as the job user. When compared to LinuxContainerExecutor (LCE), the WCE has some minor differences: - it does no delegate the creation of the user cache directories to the native implementation. - it does no require special handling to be able to delete user files The approach on the WCE came from a practical trial-and-error approach. I had to iron out some issues around the Windows script shell limitations (command line length) to get it to work, the biggest issue being the huge CLASSPATH that is commonplace in Hadoop environment container executions. The job container itself is already dealing with this via a so called 'classpath jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch as a separate container the same issue had to be resolved and I used the same 'classpath jar' approach. Deployment Requirements --- To use the WCE one needs to set the `yarn.nodemanager.container-executor.class` to `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` and set the `yarn.nodemanager.windows-secure-container-executor.group` to a Windows security group name that is the nodemanager service principal is a member of (equivalent of LCE `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE does not require any configuration outside of the Hadoop own's yar-site.xml. For WCE to work the nodemanager must run as a service principal that is member of the local Administrators group or LocalSystem. this is derived from the need to invoke LoadUserProfile API which mention these requirements in the specifications. This is in addition to the SE_TCB privilege mentioned in YARN-1063, but this requirement will automatically imply that the SE_TCB privilege is held by the nodemanager. For the Linux speakers in the audience, the requirement is basically to run NM as root. Dedicated high privilege Service Due to the high privilege required by the WCE we had discussed the need to isolate the high privilege operations into a separate process, an 'executor' service that is solely responsible to start the containers (incloding the localizer). The NM would have to authenticate, authorize and communicate with this service via an IPC mechanism and use this service to launch the containers. I still believe we'll end up deploying such a service, but the effort to onboard such a new platfrom specific new service on the project are not trivial. was: This work item represents the Java side changes required to implement a secure windows container executor, based on the YARN-1063 changes on native/winutils side. Necessary changes include leveraging the winutils task createas to launch the container process as the required user and a secure localizer (launch localization as a separate process running as the container user). > Implement secure Windows Container Executor > --- > > Key: YARN-1972 > URL: https://issues.apache.org/jira/browse/YARN-1972 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-1972.1.patch, YARN-1972.2.patch > > > Windows Secure Container Executor (WCE) > > YARN-1063 adds the necessary infrasturcture to launch a process as a domain > user as a solution for the problem of having a security boundary between > processes executed in YARN containers and the Hadoop ser
[jira] [Commented] (YARN-2061) Revisit logging levels in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018798#comment-14018798 ] Hudson commented on YARN-2061: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1765 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1765/]) YARN-2061. Revisit logging levels in ZKRMStateStore. (Ray Chiang via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1600498) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java > Revisit logging levels in ZKRMStateStore > - > > Key: YARN-2061 > URL: https://issues.apache.org/jira/browse/YARN-2061 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla >Assignee: Ray Chiang >Priority: Minor > Labels: newbie > Fix For: 2.5.0 > > Attachments: YARN2061-01.patch > > > ZKRMStateStore has a few places where it is logging at the INFO level. We > should change these to DEBUG or TRACE level messages. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2119) DEFAULT_PROXY_ADDRESS should use DEFAULT_PROXY_PORT
[ https://issues.apache.org/jira/browse/YARN-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018797#comment-14018797 ] Hudson commented on YARN-2119: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1765 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1765/]) YARN-2119. DEFAULT_PROXY_ADDRESS should use DEFAULT_PROXY_PORT. (Anubhav Dhoot via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1600484) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServer.java > DEFAULT_PROXY_ADDRESS should use DEFAULT_PROXY_PORT > --- > > Key: YARN-2119 > URL: https://issues.apache.org/jira/browse/YARN-2119 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.5.0 > > Attachments: YARN-2119.patch > > > The fix for [YARN-1590|https://issues.apache.org/jira/browse/YARN-1590] > introduced an method to get web proxy bind address with the incorrect default > port. Because all the users of the method (only 1 user) ignores the port, its > not breaking anything yet. Fixing it in case someone else uses this in the > future. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2030) Use StateMachine to simplify handleStoreEvent() in RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018758#comment-14018758 ] Junping Du commented on YARN-2030: -- bq. Junping Du, can you help with the review and commit ? thx. Sure. I am glad to help here. [~decster], we want to have abstract class here as we want to provide as simple interface as possible to end user (or AM) who can just simply call ApplicationStateData.newInstance() without involving the complexity of ApplicationStateDataPBImpl. Make sense? > Use StateMachine to simplify handleStoreEvent() in RMStateStore > --- > > Key: YARN-2030 > URL: https://issues.apache.org/jira/browse/YARN-2030 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Junping Du >Assignee: Binglin Chang > Attachments: YARN-2030.v1.patch, YARN-2030.v2.patch > > > Now the logic to handle different store events in handleStoreEvent() is as > following: > {code} > if (event.getType().equals(RMStateStoreEventType.STORE_APP) > || event.getType().equals(RMStateStoreEventType.UPDATE_APP)) { > ... > if (event.getType().equals(RMStateStoreEventType.STORE_APP)) { > ... > } else { > ... > } > ... > try { > if (event.getType().equals(RMStateStoreEventType.STORE_APP)) { > ... > } else { > ... > } > } > ... > } else if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT) > || event.getType().equals(RMStateStoreEventType.UPDATE_APP_ATTEMPT)) { > ... > if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) { > ... > } else { > ... > } > ... > if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) { > ... > } else { > ... > } > } > ... > } else if (event.getType().equals(RMStateStoreEventType.REMOVE_APP)) { > ... > } else { > ... > } > } > {code} > This is not only confuse people but also led to mistake easily. We may > leverage state machine to simply this even no state transitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1977) Add tests on getApplicationRequest with filtering start time range
[ https://issues.apache.org/jira/browse/YARN-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018709#comment-14018709 ] Hadoop QA commented on YARN-1977: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641524/YARN-1977.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3915//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3915//console This message is automatically generated. > Add tests on getApplicationRequest with filtering start time range > -- > > Key: YARN-1977 > URL: https://issues.apache.org/jira/browse/YARN-1977 > Project: Hadoop YARN > Issue Type: Test >Reporter: Junping Du >Assignee: Junping Du >Priority: Minor > Fix For: 2.4.1 > > Attachments: YARN-1977.patch > > > There is no unit test to verify if request with start time range works to get > right application list, we should add it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2061) Revisit logging levels in ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018682#comment-14018682 ] Hudson commented on YARN-2061: -- FAILURE: Integrated in Hadoop-Yarn-trunk #574 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/574/]) YARN-2061. Revisit logging levels in ZKRMStateStore. (Ray Chiang via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1600498) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java > Revisit logging levels in ZKRMStateStore > - > > Key: YARN-2061 > URL: https://issues.apache.org/jira/browse/YARN-2061 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla >Assignee: Ray Chiang >Priority: Minor > Labels: newbie > Fix For: 2.5.0 > > Attachments: YARN2061-01.patch > > > ZKRMStateStore has a few places where it is logging at the INFO level. We > should change these to DEBUG or TRACE level messages. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2119) DEFAULT_PROXY_ADDRESS should use DEFAULT_PROXY_PORT
[ https://issues.apache.org/jira/browse/YARN-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018681#comment-14018681 ] Hudson commented on YARN-2119: -- FAILURE: Integrated in Hadoop-Yarn-trunk #574 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/574/]) YARN-2119. DEFAULT_PROXY_ADDRESS should use DEFAULT_PROXY_PORT. (Anubhav Dhoot via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1600484) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServer.java > DEFAULT_PROXY_ADDRESS should use DEFAULT_PROXY_PORT > --- > > Key: YARN-2119 > URL: https://issues.apache.org/jira/browse/YARN-2119 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Fix For: 2.5.0 > > Attachments: YARN-2119.patch > > > The fix for [YARN-1590|https://issues.apache.org/jira/browse/YARN-1590] > introduced an method to get web proxy bind address with the incorrect default > port. Because all the users of the method (only 1 user) ignores the port, its > not breaking anything yet. Fixing it in case someone else uses this in the > future. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018642#comment-14018642 ] Remus Rusanu commented on YARN-1972: There is more feedback to address (DRY between LCE and WCE localization launch, proper place for localization classpath jar). I do not plan to address launch nice-ness in this JIRA. > Implement secure Windows Container Executor > --- > > Key: YARN-1972 > URL: https://issues.apache.org/jira/browse/YARN-1972 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-1972.1.patch, YARN-1972.2.patch > > > This work item represents the Java side changes required to implement a > secure windows container executor, based on the YARN-1063 changes on > native/winutils side. > Necessary changes include leveraging the winutils task createas to launch the > container process as the required user and a secure localizer (launch > localization as a separate process running as the container user). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-1972: --- Attachment: YARN-1972.2.patch Patch.2, rebased to current trunk, with review feedback: - DCE and WCE no longer create user file cache, this is done solely by the localizer initDirs. DCE Test modified to reflect this. DCE.createUserCacheDirs renamed to createUserAppCacheDirs accordingly - namenodeGroup -> nodeManagerGroup - removed appLocalizationCounter, use locId instead (container ID) as the winutils "jobName" for the localizer runas launch > Implement secure Windows Container Executor > --- > > Key: YARN-1972 > URL: https://issues.apache.org/jira/browse/YARN-1972 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-1972.1.patch, YARN-1972.2.patch > > > This work item represents the Java side changes required to implement a > secure windows container executor, based on the YARN-1063 changes on > native/winutils side. > Necessary changes include leveraging the winutils task createas to launch the > container process as the required user and a secure localizer (launch > localization as a separate process running as the container user). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018625#comment-14018625 ] Remus Rusanu commented on YARN-1972: Fix is trivial, WCE startLocalizer should use locId for the winutils jobName, not appId. > Implement secure Windows Container Executor > --- > > Key: YARN-1972 > URL: https://issues.apache.org/jira/browse/YARN-1972 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-1972.1.patch > > > This work item represents the Java side changes required to implement a > secure windows container executor, based on the YARN-1063 changes on > native/winutils side. > Necessary changes include leveraging the winutils task createas to launch the > container process as the required user and a secure localizer (launch > localization as a separate process running as the container user). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2124) ProportionalCapacityPreemptionPolicy cannot work because it's initialized before scheduler initialized
[ https://issues.apache.org/jira/browse/YARN-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018593#comment-14018593 ] Hadoop QA commented on YARN-2124: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648441/YARN-2124.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3914//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3914//console This message is automatically generated. > ProportionalCapacityPreemptionPolicy cannot work because it's initialized > before scheduler initialized > -- > > Key: YARN-2124 > URL: https://issues.apache.org/jira/browse/YARN-2124 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 3.0.0 >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-2124.patch > > > When I play with scheduler with preemption, I found > ProportionalCapacityPreemptionPolicy cannot work. NPE will be raised when RM > start > {code} > 2014-06-05 11:01:33,201 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[SchedulingMonitor (ProportionalCapacityPreemptionPolicy),5,main] threw > an Exception. > java.lang.NullPointerException > at > org.apache.hadoop.yarn.util.resource.Resources.greaterThan(Resources.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.computeIdealResourceDistribution(ProportionalCapacityPreemptionPolicy.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.recursivelyComputeIdealAssignment(ProportionalCapacityPreemptionPolicy.java:261) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:198) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:174) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:72) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run(SchedulingMonitor.java:82) > at java.lang.Thread.run(Thread.java:744) > {code} > This is caused by ProportionalCapacityPreemptionPolicy needs > ResourceCalculator from CapacityScheduler. But > ProportionalCapacityPreemptionPolicy get initialized before CapacityScheduler > initialized. So ResourceCalculator will set to null in > ProportionalCapacityPreemptionPolicy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2125) ProportionalCapacityPreemptionPolicy should only log CSV when debug enabled
[ https://issues.apache.org/jira/browse/YARN-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018592#comment-14018592 ] Hadoop QA commented on YARN-2125: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648444/YARN-2125.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3913//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3913//console This message is automatically generated. > ProportionalCapacityPreemptionPolicy should only log CSV when debug enabled > --- > > Key: YARN-2125 > URL: https://issues.apache.org/jira/browse/YARN-2125 > Project: Hadoop YARN > Issue Type: Task > Components: resourcemanager, scheduler >Affects Versions: 3.0.0 >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Minor > Attachments: YARN-2125.patch > > > Currently, logToCSV() will be output using LOG.info() in > ProportionalCapacityPreemptionPolicy. Which will generate non-human-readable > texts in resource manager's log every several seconds, like > {code} > ... > 2014-06-05 15:57:07,603 INFO > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: > QUEUESTATE: 1401955027603, a1, 4096, 3, 2048, 2, 4096, 3, 4096, 3, 0, 0, 0, > 0, b1, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0, b2, 3072, 2, 1024, 1, > 3072, 2, 3072, 2, 0, 0, 0, 0 > 2014-06-05 15:57:10,603 INFO > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: > QUEUESTATE: 1401955030603, a1, 4096, 3, 2048, 2, 4096, 3, 4096, 3, 0, 0, 0, > 0, b1, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0, b2, 3072, 2, 1024, 1, > 3072, 2, 3072, 2, 0, 0, 0, 0 > ... > {code} > It's better to make it output when debug enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018583#comment-14018583 ] Remus Rusanu commented on YARN-1972: Tracked this down to {code}LocalizerRunner.run(){code}: {code} exec.startLocalizer(nmPrivateCTokensPath, localizationServerAddress, context.getUser(), ConverterUtils.toString( context.getContainerId(). getApplicationAttemptId().getApplicationId()), {code} Notice the use of application id, not attempt id when launching the localizer. I will change this to attempt id to eliminate the possibility of duplicates. > Implement secure Windows Container Executor > --- > > Key: YARN-1972 > URL: https://issues.apache.org/jira/browse/YARN-1972 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-1972.1.patch > > > This work item represents the Java side changes required to implement a > secure windows container executor, based on the YARN-1063 changes on > native/winutils side. > Necessary changes include leveraging the winutils task createas to launch the > container process as the required user and a secure localizer (launch > localization as a separate process running as the container user). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2124) ProportionalCapacityPreemptionPolicy cannot work because it's initialized before scheduler initialized
[ https://issues.apache.org/jira/browse/YARN-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018573#comment-14018573 ] Wangda Tan commented on YARN-2124: -- And I just verified this error is not occurred in 2.4.0. > ProportionalCapacityPreemptionPolicy cannot work because it's initialized > before scheduler initialized > -- > > Key: YARN-2124 > URL: https://issues.apache.org/jira/browse/YARN-2124 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 3.0.0 >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-2124.patch > > > When I play with scheduler with preemption, I found > ProportionalCapacityPreemptionPolicy cannot work. NPE will be raised when RM > start > {code} > 2014-06-05 11:01:33,201 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[SchedulingMonitor (ProportionalCapacityPreemptionPolicy),5,main] threw > an Exception. > java.lang.NullPointerException > at > org.apache.hadoop.yarn.util.resource.Resources.greaterThan(Resources.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.computeIdealResourceDistribution(ProportionalCapacityPreemptionPolicy.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.recursivelyComputeIdealAssignment(ProportionalCapacityPreemptionPolicy.java:261) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:198) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:174) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:72) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run(SchedulingMonitor.java:82) > at java.lang.Thread.run(Thread.java:744) > {code} > This is caused by ProportionalCapacityPreemptionPolicy needs > ResourceCalculator from CapacityScheduler. But > ProportionalCapacityPreemptionPolicy get initialized before CapacityScheduler > initialized. So ResourceCalculator will set to null in > ProportionalCapacityPreemptionPolicy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2125) ProportionalCapacityPreemptionPolicy should only log CSV when debug enabled
[ https://issues.apache.org/jira/browse/YARN-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2125: - Attachment: YARN-2125.patch Attached a simple patch to fix this. > ProportionalCapacityPreemptionPolicy should only log CSV when debug enabled > --- > > Key: YARN-2125 > URL: https://issues.apache.org/jira/browse/YARN-2125 > Project: Hadoop YARN > Issue Type: Task > Components: resourcemanager, scheduler >Affects Versions: 3.0.0 >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Minor > Attachments: YARN-2125.patch > > > Currently, logToCSV() will be output using LOG.info() in > ProportionalCapacityPreemptionPolicy. Which will generate non-human-readable > texts in resource manager's log every several seconds, like > {code} > ... > 2014-06-05 15:57:07,603 INFO > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: > QUEUESTATE: 1401955027603, a1, 4096, 3, 2048, 2, 4096, 3, 4096, 3, 0, 0, 0, > 0, b1, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0, b2, 3072, 2, 1024, 1, > 3072, 2, 3072, 2, 0, 0, 0, 0 > 2014-06-05 15:57:10,603 INFO > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: > QUEUESTATE: 1401955030603, a1, 4096, 3, 2048, 2, 4096, 3, 4096, 3, 0, 0, 0, > 0, b1, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0, b2, 3072, 2, 1024, 1, > 3072, 2, 3072, 2, 0, 0, 0, 0 > ... > {code} > It's better to make it output when debug enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2125) ProportionalCapacityPreemptionPolicy should only log CSV when debug enabled
Wangda Tan created YARN-2125: Summary: ProportionalCapacityPreemptionPolicy should only log CSV when debug enabled Key: YARN-2125 URL: https://issues.apache.org/jira/browse/YARN-2125 Project: Hadoop YARN Issue Type: Task Components: resourcemanager, scheduler Affects Versions: 3.0.0 Reporter: Wangda Tan Assignee: Wangda Tan Priority: Minor Attachments: YARN-2125.patch Currently, logToCSV() will be output using LOG.info() in ProportionalCapacityPreemptionPolicy. Which will generate non-human-readable texts in resource manager's log every several seconds, like {code} ... 2014-06-05 15:57:07,603 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1401955027603, a1, 4096, 3, 2048, 2, 4096, 3, 4096, 3, 0, 0, 0, 0, b1, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0, b2, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0 2014-06-05 15:57:10,603 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1401955030603, a1, 4096, 3, 2048, 2, 4096, 3, 4096, 3, 0, 0, 0, 0, b1, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0, b2, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0 ... {code} It's better to make it output when debug enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2124) ProportionalCapacityPreemptionPolicy cannot work because it's initialized before scheduler initialized
[ https://issues.apache.org/jira/browse/YARN-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2124: - Attachment: YARN-2124.patch Attached a patch to solve this problem. Moved ProportionalCapacityPreemptionPolicy.init(...) from RMActiveService.init() to SchedulerMonitor.serviceInit(...). SchedulerMonitor will be always added after Scheduler added, so that ProportionalCapacityPreemptionPolicy will be initialized after SchedulerMonitor initialized. Added a test to ProportionalCapacityPreemptionPolicy to make sure no regression of in the future. > ProportionalCapacityPreemptionPolicy cannot work because it's initialized > before scheduler initialized > -- > > Key: YARN-2124 > URL: https://issues.apache.org/jira/browse/YARN-2124 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler >Affects Versions: 3.0.0 >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-2124.patch > > > When I play with scheduler with preemption, I found > ProportionalCapacityPreemptionPolicy cannot work. NPE will be raised when RM > start > {code} > 2014-06-05 11:01:33,201 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[SchedulingMonitor (ProportionalCapacityPreemptionPolicy),5,main] threw > an Exception. > java.lang.NullPointerException > at > org.apache.hadoop.yarn.util.resource.Resources.greaterThan(Resources.java:225) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.computeIdealResourceDistribution(ProportionalCapacityPreemptionPolicy.java:302) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.recursivelyComputeIdealAssignment(ProportionalCapacityPreemptionPolicy.java:261) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:198) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:174) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:72) > at > org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run(SchedulingMonitor.java:82) > at java.lang.Thread.run(Thread.java:744) > {code} > This is caused by ProportionalCapacityPreemptionPolicy needs > ResourceCalculator from CapacityScheduler. But > ProportionalCapacityPreemptionPolicy get initialized before CapacityScheduler > initialized. So ResourceCalculator will set to null in > ProportionalCapacityPreemptionPolicy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2124) ProportionalCapacityPreemptionPolicy cannot work because it's initialized before scheduler initialized
Wangda Tan created YARN-2124: Summary: ProportionalCapacityPreemptionPolicy cannot work because it's initialized before scheduler initialized Key: YARN-2124 URL: https://issues.apache.org/jira/browse/YARN-2124 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 3.0.0 Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical When I play with scheduler with preemption, I found ProportionalCapacityPreemptionPolicy cannot work. NPE will be raised when RM start {code} 2014-06-05 11:01:33,201 ERROR org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[SchedulingMonitor (ProportionalCapacityPreemptionPolicy),5,main] threw an Exception. java.lang.NullPointerException at org.apache.hadoop.yarn.util.resource.Resources.greaterThan(Resources.java:225) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.computeIdealResourceDistribution(ProportionalCapacityPreemptionPolicy.java:302) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.recursivelyComputeIdealAssignment(ProportionalCapacityPreemptionPolicy.java:261) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:198) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:174) at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:72) at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run(SchedulingMonitor.java:82) at java.lang.Thread.run(Thread.java:744) {code} This is caused by ProportionalCapacityPreemptionPolicy needs ResourceCalculator from CapacityScheduler. But ProportionalCapacityPreemptionPolicy get initialized before CapacityScheduler initialized. So ResourceCalculator will set to null in ProportionalCapacityPreemptionPolicy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018556#comment-14018556 ] Remus Rusanu commented on YARN-1972: [~vinodkv] About the uniqueness of appid for localizer: it is not unique when multiple tasks are being localized for the same job, on the same node. Simply running pi with 100 splits on a 2 node cluster results in many duplicate errors. For task localization should be task_id, I believe. > Implement secure Windows Container Executor > --- > > Key: YARN-1972 > URL: https://issues.apache.org/jira/browse/YARN-1972 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > Attachments: YARN-1972.1.patch > > > This work item represents the Java side changes required to implement a > secure windows container executor, based on the YARN-1063 changes on > native/winutils side. > Necessary changes include leveraging the winutils task createas to launch the > container process as the required user and a secure localizer (launch > localization as a separate process running as the container user). -- This message was sent by Atlassian JIRA (v6.2#6252)