[jira] [Assigned] (YARN-1801) NPE in public localizer
[ https://issues.apache.org/jira/browse/YARN-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo reassigned YARN-1801: - Assignee: Hong Zhiguo > NPE in public localizer > --- > > Key: YARN-1801 > URL: https://issues.apache.org/jira/browse/YARN-1801 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Jason Lowe >Assignee: Hong Zhiguo >Priority: Critical > > While investigating YARN-1800 found this in the NM logs that caused the > public localizer to shutdown: > {noformat} > 2014-01-23 01:26:38,655 INFO localizer.ResourceLocalizationService > (ResourceLocalizationService.java:addResource(651)) - Downloading public > rsrc:{ > hdfs://colo-2:8020/user/fertrist/oozie-oozi/601-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar, > 1390440382009, FILE, null } > 2014-01-23 01:26:38,656 FATAL localizer.ResourceLocalizationService > (ResourceLocalizationService.java:run(726)) - Error: Shutting down > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.run(ResourceLocalizationService.java:712) > 2014-01-23 01:26:38,656 INFO localizer.ResourceLocalizationService > (ResourceLocalizationService.java:run(728)) - Public cache exiting > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1801) NPE in public localizer
[ https://issues.apache.org/jira/browse/YARN-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-1801: -- Attachment: YARN-1801.patch {code} Path local = completed.get(); {code} may throw ExecutionException and assoc may be null. When both of them happen, we got NPE in {code} LOG.info("Failed to download rsrc " + assoc.getResource(), e.getCause()); {code} And this is exactly the line "ResourceLocalizationService.java:712" of commit dd9c059 (2013-10-05 YARN-1254). > NPE in public localizer > --- > > Key: YARN-1801 > URL: https://issues.apache.org/jira/browse/YARN-1801 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.2.0 >Reporter: Jason Lowe >Assignee: Hong Zhiguo >Priority: Critical > Attachments: YARN-1801.patch > > > While investigating YARN-1800 found this in the NM logs that caused the > public localizer to shutdown: > {noformat} > 2014-01-23 01:26:38,655 INFO localizer.ResourceLocalizationService > (ResourceLocalizationService.java:addResource(651)) - Downloading public > rsrc:{ > hdfs://colo-2:8020/user/fertrist/oozie-oozi/601-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar, > 1390440382009, FILE, null } > 2014-01-23 01:26:38,656 FATAL localizer.ResourceLocalizationService > (ResourceLocalizationService.java:run(726)) - Error: Shutting down > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.run(ResourceLocalizationService.java:712) > 2014-01-23 01:26:38,656 INFO localizer.ResourceLocalizationService > (ResourceLocalizationService.java:run(728)) - Public cache exiting > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2049) Delegation token stuff for the timeline sever
[ https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005700#comment-14005700 ] Hadoop QA commented on YARN-2049: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646158/YARN-2049.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3788//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3788//console This message is automatically generated. > Delegation token stuff for the timeline sever > - > > Key: YARN-2049 > URL: https://issues.apache.org/jira/browse/YARN-2049 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, > YARN-2049.4.patch, YARN-2049.5.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2092) Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/YARN-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005718#comment-14005718 ] Steve Loughran commented on YARN-2092: -- This seems like from the HADOOP-10104 patch. Which went in because the 2.2+ version of jackson was so out of date is was breaking other things. I'm not sure its so much incompatible as that TEZ is trying to push in its own version of jackon, which is then leading to classpath mixing problems. Even if you try to push in one set of the JARs ahead of the other, things are going to break. I know, I've tried. jackson 1.x should be compatible at run time with code build for previous versions. If there's a link problem there then it's something we can take up with the Jackson team. > Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to > 2.5.0-SNAPSHOT > > > Key: YARN-2092 > URL: https://issues.apache.org/jira/browse/YARN-2092 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah > > Came across this when trying to integrate with the timeline server. Using a > 1.8.8 dependency of jackson works fine against 2.4.0 but fails against > 2.5.0-SNAPSHOT which needs 1.9.13. This is in the scenario where the user > jars are first in the classpath. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2088) Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder
[ https://issues.apache.org/jira/browse/YARN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005722#comment-14005722 ] Binglin Chang commented on YARN-2088: - Hi Zhiguo, Thanks for the comments, nice catch. Those two lines are used in every record class... so delete them in a single place actually break code conversion, and it's not related to this jira. We may discuss whether to delete them all in other jira. > Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder > > > Key: YARN-2088 > URL: https://issues.apache.org/jira/browse/YARN-2088 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Binglin Chang >Assignee: Binglin Chang > Attachments: YARN-2088.v1.patch > > > Some fields(set,list) are added to proto builders many times, we need to > clear those fields before add, otherwise the result proto contains more > contents. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2030) Use StateMachine to simplify handleStoreEvent() in RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005753#comment-14005753 ] Binglin Chang commented on YARN-2030: - Hi Jian He, Thanks for the comments, looks like PBImpl already has ProtoBase as super class, so we can't change interface to abstract class {code} public class ApplicationAttemptStateDataPBImpl extends ProtoBase implements ApplicationAttemptStateData { {code} > Use StateMachine to simplify handleStoreEvent() in RMStateStore > --- > > Key: YARN-2030 > URL: https://issues.apache.org/jira/browse/YARN-2030 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Junping Du >Assignee: Binglin Chang > Attachments: YARN-2030.v1.patch, YARN-2030.v2.patch > > > Now the logic to handle different store events in handleStoreEvent() is as > following: > {code} > if (event.getType().equals(RMStateStoreEventType.STORE_APP) > || event.getType().equals(RMStateStoreEventType.UPDATE_APP)) { > ... > if (event.getType().equals(RMStateStoreEventType.STORE_APP)) { > ... > } else { > ... > } > ... > try { > if (event.getType().equals(RMStateStoreEventType.STORE_APP)) { > ... > } else { > ... > } > } > ... > } else if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT) > || event.getType().equals(RMStateStoreEventType.UPDATE_APP_ATTEMPT)) { > ... > if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) { > ... > } else { > ... > } > ... > if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) { > ... > } else { > ... > } > } > ... > } else if (event.getType().equals(RMStateStoreEventType.REMOVE_APP)) { > ... > } else { > ... > } > } > {code} > This is not only confuse people but also led to mistake easily. We may > leverage state machine to simply this even no state transitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2092) Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/YARN-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005804#comment-14005804 ] Steve Loughran commented on YARN-2092: -- I should add that the underlying issue is that the AM gets then entire CP from the {{yarn.lib.classpath}}. That's mandatory to pick up a version of the hadoop binaries (and -site.xml files) compatible with the rest of the cluster. But it brings in all the other dependencies which hadoop itself relies on. As hadoop evolves, this problem will only continue. The only viable long-term solution is to somehow support OSGi-launched AMs, so the AM only gets the org.apache.hadoop classes from the hadoop JARs, and has to explicitly add everything itself. See HADOOP-7977 for this -maybe it's something we could target for hadoop 3.0 driven by the needs of AMs > Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to > 2.5.0-SNAPSHOT > > > Key: YARN-2092 > URL: https://issues.apache.org/jira/browse/YARN-2092 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah > > Came across this when trying to integrate with the timeline server. Using a > 1.8.8 dependency of jackson works fine against 2.4.0 but fails against > 2.5.0-SNAPSHOT which needs 1.9.13. This is in the scenario where the user > jars are first in the classpath. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005863#comment-14005863 ] Rohith commented on YARN-1366: -- bq. I mean what will go wrong is we allow unregister without register? Is it fundamentally wrong? Allowing unregister without register, move application to FINISHED state(after handling unregistered event at launched) which supposed to be Failed state. If it can be acceptable, then its fine to go ahead. > ApplicationMasterService should Resync with the AM upon allocate call after > restart > --- > > Key: YARN-1366 > URL: https://issues.apache.org/jira/browse/YARN-1366 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Rohith > Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.patch, > YARN-1366.prototype.patch, YARN-1366.prototype.patch > > > The ApplicationMasterService currently sends a resync response to which the > AM responds by shutting down. The AM behavior is expected to change to > calling resyncing with the RM. Resync means resetting the allocate RPC > sequence number to 0 and the AM should send its entire outstanding request to > the RM. Note that if the AM is making its first allocate call to the RM then > things should proceed like normal without needing a resync. The RM will > return all containers that have completed since the RM last synced with the > AM. Some container completions may be reported more than once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006066#comment-14006066 ] Eric Payne commented on YARN-415: - The Generic Application History server stores all of the information about containers that are needed to calculate memory seconds and vcore seconds. Right now, since the Generic Application Server is tied closely with the Timeline Server, this does not work on a secured cluster. Also, the information is only available via REST API right now, and there would need to be some scripting and parsing of the REST APIs to rolled up metrics for each app. So, I think this JIRA still would be very helpful and useful. FYI, On an unsecured cluster with the Generic Application History Server and the Timeline Server configured and running, the following REST APIs will give enough information about an app to calculate memory seconds and vcore seconds: {panel:title=Get list of app attempts for a specified appID|titleBGColor=#F7D6C1} curl --compressed -H "Accept: application/json" -X GET "http://:/ws/v1/applicationhistory/apps//appattempts" {panel} {panel:title=For each app attempt, get all container info|titleBGColor=#F7D6C1} curl --compressed -H "Accept: application/json" -X GET "http:///ws/v1/applicationhistory/apps//appattempts//containers" {panel} > Capture memory utilization at the app-level for chargeback > -- > > Key: YARN-415 > URL: https://issues.apache.org/jira/browse/YARN-415 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 0.23.6 >Reporter: Kendall Thrapp >Assignee: Andrey Klochkov > Attachments: YARN-415--n10.patch, YARN-415--n2.patch, > YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, > YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, > YARN-415--n9.patch, YARN-415.patch > > > For the purpose of chargeback, I'd like to be able to compute the cost of an > application in terms of cluster resource usage. To start out, I'd like to > get the memory utilization of an application. The unit should be MB-seconds > or something similar and, from a chargeback perspective, the memory amount > should be the memory reserved for the application, as even if the app didn't > use all that memory, no one else was able to use it. > (reserved ram for container 1 * lifetime of container 1) + (reserved ram for > container 2 * lifetime of container 2) + ... + (reserved ram for container n > * lifetime of container n) > It'd be nice to have this at the app level instead of the job level because: > 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't > appear on the job history server). > 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). > This new metric should be available both through the RM UI and RM Web > Services REST API. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1474) Make schedulers services
[ https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1474: - Attachment: YARN-1474.16.patch Rebased on trunk. > Make schedulers services > > > Key: YARN-1474 > URL: https://issues.apache.org/jira/browse/YARN-1474 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.3.0, 2.4.0 >Reporter: Sandy Ryza >Assignee: Tsuyoshi OZAWA > Attachments: YARN-1474.1.patch, YARN-1474.10.patch, > YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, > YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.16.patch, > YARN-1474.2.patch, YARN-1474.3.patch, YARN-1474.4.patch, YARN-1474.5.patch, > YARN-1474.6.patch, YARN-1474.7.patch, YARN-1474.8.patch, YARN-1474.9.patch > > > Schedulers currently have a reinitialize but no start and stop. Fitting them > into the YARN service model would make things more coherent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2049) Delegation token stuff for the timeline sever
[ https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006138#comment-14006138 ] Varun Vasudev commented on YARN-2049: - Couple of things - 1. In the function managementOperation, should there be a null check for token? 2. In the function managementOperation, you call secretManager.cancelToken(dt, UserGroupInformation.getCurrentUser().getUserName()) - should you use getCurrentuser().getUserName? or ownerUGI.getUserName()? The reason I ask is that when creating the token, you're using ownerUGI. > Delegation token stuff for the timeline sever > - > > Key: YARN-2049 > URL: https://issues.apache.org/jira/browse/YARN-2049 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, > YARN-2049.4.patch, YARN-2049.5.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart
[ https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-1365: Attachment: YARN-1365.003.patch Fixed race conditions in the test that was failing. The failing test would only repro in hudson after uploading patch. > ApplicationMasterService to allow Register and Unregister of an app that was > running before restart > --- > > Key: YARN-1365 > URL: https://issues.apache.org/jira/browse/YARN-1365 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Anubhav Dhoot > Attachments: YARN-1365.001.patch, YARN-1365.002.patch, > YARN-1365.003.patch, YARN-1365.initial.patch > > > For an application that was running before restart, the > ApplicationMasterService currently throws an exception when the app tries to > make the initial register or final unregister call. These should succeed and > the RMApp state machine should transition to completed like normal. > Unregistration should succeed for an app that the RM considers complete since > the RM may have died after saving completion in the store but before > notifying the AM that the AM is free to exit. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2095) Large MapReduce Job stops responding
Clay McDonald created YARN-2095: --- Summary: Large MapReduce Job stops responding Key: YARN-2095 URL: https://issues.apache.org/jira/browse/YARN-2095 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0 Environment: CentOS 6.3 (x86_64) on vmware 10 running HDP-2.0.6 Reporter: Clay McDonald Priority: Blocker Very large jobs (7,455 Mappers and 999 Reducers) hang. Jobs run well but logging to container logs stop after running 33 hours. The job appears to be hung. The status of the job is "RUNNING". No error messages found in logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006222#comment-14006222 ] Abin Shahab commented on YARN-1964: --- Does others have comments on it. [~acmurthy]] ? > Create Docker analog of the LinuxContainerExecutor in YARN > -- > > Key: YARN-1964 > URL: https://issues.apache.org/jira/browse/YARN-1964 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.2.0 >Reporter: Arun C Murthy >Assignee: Abin Shahab > Attachments: yarn-1964-branch-2.2.0-docker.patch, > yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, > yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, > yarn-1964-docker.patch > > > Docker (https://www.docker.io/) is, increasingly, a very popular container > technology. > In context of YARN, the support for Docker will provide a very elegant > solution to allow applications to *package* their software into a Docker > container (entire Linux file system incl. custom versions of perl, python > etc.) and use it as a blueprint to launch all their YARN containers with > requisite software environment. This provides both consistency (all YARN > containers will have the same software environment) and isolation (no > interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions
[ https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006248#comment-14006248 ] Wei Yan commented on YARN-596: -- Hey, [~sandyr], sorry for the late reply. Still confuse here. So as you said, a queue is safe and doesn't allow preemption only it satisfies the condition "(usage.memory <= fairshare.memory) & (usage.vcores <= fairshare.vcores)". This condition works fine for DRF. But for FairSharePolicy, as the fairshare.vcores is always 0 (except for root), so this condition cannot be satisfied and all queues are always allowed to preempt. > In fair scheduler, intra-application container priorities affect > inter-application preemption decisions > --- > > Key: YARN-596 > URL: https://issues.apache.org/jira/browse/YARN-596 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, > YARN-596.patch, YARN-596.patch > > > In the fair scheduler, containers are chosen for preemption in the following > way: > All containers for all apps that are in queues that are over their fair share > are put in a list. > The list is sorted in order of the priority that the container was requested > in. > This means that an application can shield itself from preemption by > requesting it's containers at higher priorities, which doesn't really make > sense. > Also, an application that is not over its fair share, but that is in a queue > that is over it's fair share is just as likely to have containers preempted > as an application that is over its fair share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2049) Delegation token stuff for the timeline sever
[ https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006453#comment-14006453 ] Vinod Kumar Vavilapalli commented on YARN-2049: --- Thanks for working on this, Zhijie! Some comments on the patch TimelineKerberosAuthenticator - Not clear what TimelineDelegationTokenResponse validateAndParseResponse() is doing with class loading, construction etc. Can you explain and may be also add code comments? TimelineAuthenticationFilter - Explain what getConfiguration() overrides and add a code comment? TimelineKerberosAuthenticationHandler - This borrows a lot of code from HttpFSKerberosAuthenticationHandler.java. We should refactor either here or in a separate JIRA. Nits - TestDistributedShell change is unnecessary - TimelineDelegationTokenSelector: Wrap the debug logging in debugEnabled checks. - ApplicationHistoryServer.java -- Forced config setting of the filter: What happens if the cluster has another authentication filter? Is the guideline to override it (which is what the patch is doing)? h4. Source code refactor TimelineKerberosAuthenticationHandler - Rename to TimelineClientAuthenticationService? TimelineKerberosAuthenticator - It seems like TimelineKerberosAuthenticator is completely client side code and so should be moved to the client module - To do that we will extract some of the constants and the DelegationTokenOperation enum as top level entities into the common module. TimelineAuthenticationFilterInitializer - This is almost the same as the common AuthenticationFilterInitializer.java. Let's just refactor AuthenticationFilterInitializer.java and extend it to only change class names. Similarly to how TimelineAuthenticationFilter extends AuthenticationFilter. TimelineDelegationTokenSecretManagerService: - We are sharing the configs for update/renewal etc with the ResourceManager. That seems fine for now - logically you want both the tokens to follow similar expiry and life-cycle - This also shares a bunch of code with org/apache/hadoop/lib/service/security/DelegationTokenManagerService. We may or may not want to reuse some code - just throwing it out. > Delegation token stuff for the timeline sever > - > > Key: YARN-2049 > URL: https://issues.apache.org/jira/browse/YARN-2049 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, > YARN-2049.4.patch, YARN-2049.5.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures
[ https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006459#comment-14006459 ] Mayank Bansal commented on YARN-2074: - Thanks [~jianhe] for the patch. Overall looks good. some nits {code} maxAppAttempts <= attempts.size() {code} Can we use this? {code} maxAppAttempts == getAttemptFailureCount() {code} {code} public boolean isPreempted() { return getDiagnostics().contains(SchedulerUtils.PREEMPTED_CONTAINER); } {code} I think we need to compare the exit status (-102) instead of relying on string message. > Preemption of AM containers shouldn't count towards AM failures > --- > > Key: YARN-2074 > URL: https://issues.apache.org/jira/browse/YARN-2074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Vinod Kumar Vavilapalli >Assignee: Jian He > Attachments: YARN-2074.1.patch, YARN-2074.2.patch > > > One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM > containers getting preempted shouldn't count towards AM failures and thus > shouldn't eventually fail applications. > We should explicitly handle AM container preemption/kill as a separate issue > and not count it towards the limit on AM failures. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins
[ https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006465#comment-14006465 ] Jian He commented on YARN-1408: --- Hi [~sunilg], agree that we should remove container from newlyAllocatedContainers when preemption happens. As per the race condition you mentioned, we may also preempt ACQUIRED container? In fact, I think the best container to be preempted is the ALLOCATED container as these containers are not yet alive from the user's perspective. As per the race condition that [RM lost the resource request], today the resource request is decremented when container is allocated. we may change it to decrement the resource request only when the container is pulled by the AM ? We can do this separately if this makes sense. > Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task > timeout for 30mins > -- > > Key: YARN-1408 > URL: https://issues.apache.org/jira/browse/YARN-1408 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, > Yarn-1408.4.patch, Yarn-1408.patch > > > Capacity preemption is enabled as follows. > * yarn.resourcemanager.scheduler.monitor.enable= true , > * > yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy > Queue = a,b > Capacity of Queue A = 80% > Capacity of Queue B = 20% > Step 1: Assign a big jobA on queue a which uses full cluster capacity > Step 2: Submitted a jobB to queue b which would use less than 20% of cluster > capacity > JobA task which uses queue b capcity is been preempted and killed. > This caused below problem: > 1. New Container has got allocated for jobA in Queue A as per node update > from an NM. > 2. This container has been preempted immediately as per preemption. > Here ACQUIRED at KILLED Invalid State exception came when the next AM > heartbeat reached RM. > ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ACQUIRED at KILLED > This also caused the Task to go for a timeout for 30minutes as this Container > was already killed by preemption. > attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart
[ https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006486#comment-14006486 ] Tsuyoshi OZAWA commented on YARN-1365: -- [~adhoot], thank you for updating a patch. Looks good to me overall. Minor nits: We can removed following unused values: {code} // TestApplicationMasterLauncher.java boolean thrown = false; {code} {code} // TestRMRestart.java Map rmAppState = rmState.getApplicationState(); {code} > ApplicationMasterService to allow Register and Unregister of an app that was > running before restart > --- > > Key: YARN-1365 > URL: https://issues.apache.org/jira/browse/YARN-1365 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Anubhav Dhoot > Attachments: YARN-1365.001.patch, YARN-1365.002.patch, > YARN-1365.003.patch, YARN-1365.initial.patch > > > For an application that was running before restart, the > ApplicationMasterService currently throws an exception when the app tries to > make the initial register or final unregister call. These should succeed and > the RMApp state machine should transition to completed like normal. > Unregistration should succeed for an app that the RM considers complete since > the RM may have died after saving completion in the store but before > notifying the AM that the AM is free to exit. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures
[ https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2074: -- Attachment: YARN-2074.3.patch > Preemption of AM containers shouldn't count towards AM failures > --- > > Key: YARN-2074 > URL: https://issues.apache.org/jira/browse/YARN-2074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Vinod Kumar Vavilapalli >Assignee: Jian He > Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch > > > One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM > containers getting preempted shouldn't count towards AM failures and thus > shouldn't eventually fail applications. > We should explicitly handle AM container preemption/kill as a separate issue > and not count it towards the limit on AM failures. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures
[ https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006525#comment-14006525 ] Jian He commented on YARN-2074: --- Thanks Xuan and Mayank for the review ! bq. maxAppAttempts == getAttemptFailureCount() good point. Fixed the attempt to compare against the exit status to determine preempted or not. > Preemption of AM containers shouldn't count towards AM failures > --- > > Key: YARN-2074 > URL: https://issues.apache.org/jira/browse/YARN-2074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Vinod Kumar Vavilapalli >Assignee: Jian He > Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch > > > One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM > containers getting preempted shouldn't count towards AM failures and thus > shouldn't eventually fail applications. > We should explicitly handle AM container preemption/kill as a separate issue > and not count it towards the limit on AM failures. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2049) Delegation token stuff for the timeline sever
[ https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2049: -- Attachment: YARN-2049.6.patch Update the patch accordingly > Delegation token stuff for the timeline sever > - > > Key: YARN-2049 > URL: https://issues.apache.org/jira/browse/YARN-2049 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, > YARN-2049.4.patch, YARN-2049.5.patch, YARN-2049.6.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2049) Delegation token stuff for the timeline sever
[ https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006542#comment-14006542 ] Zhijie Shen commented on YARN-2049: --- Thanks for review, Vinod and Varun! Please see the response bellow: bq. 1. In the function managementOperation, should there be a null check for token? There're following code before processing each dtOp: {code} if (dtOp.requiresKerberosCredentials() && token == null) { response.sendError(HttpServletResponse.SC_UNAUTHORIZED, MessageFormat.format( "Operation [{0}] requires SPNEGO authentication established", dtOp)); requestContinues = false; {code} Get and renew both require kerberos credentials, such that if token == null, the code will fall into this part. Cancel didn't require credentials before refer to HttpFS's code. However, I think we should enforce kerberos credentials for cancel as well. After that, the NPE risk is gone. bq. In the function managementOperation, you call secretManager.cancelToken(dt, UserGroupInformation.getCurrentUser().getUserName()) - should you use getCurrentuser().getUserName? or ownerUGI.getUserName()? Good catch, we should use token.getUserName here as well. bq. TimelineKerberosAuthenticator Some errors may cause TimelineAuthenticator not getting the correct response. If the status code is not 200, the json content may contain the exception information from the server, we can use the information recover exception object. This is inspired by HttpFSUtils.validateResponse, but I changed to use Jackson to parse the json content here. bq. TimelineAuthenticationFilter In the configuration we can simply set the authentication type to "kerberos", but in the timeline sever, we want to replace it with the class name of the customized authentication service. Otherwise, the standard authentication handler will be used instead. I added the code comments there. bq. TimelineKerberosAuthenticationHandler bq. TimelineDelegationTokenSecretManagerService. Yeah, we need to look into how to reuse the existing code, but how about postpone it later? I'm going to file a separate Jira for code refactoring. bq. TestDistributedShell change is unnecessary Removed. bq. TimelineDelegationTokenSelector: Wrap the debug logging in debugEnabled checks. Added the debugEnabled checks. bq. ApplicationHistoryServer.java Actually it will not override the other initializers. Instead, I just append a TimelineAuthenticationFilterInitializer. Anyway, I enhance the condition here: not only the security should be enabled, but also "kerberos" authentication is desired. bq. TimelineKerberosAuthenticationHandler Done. bq. TimelineKerberosAuthenticator. Good suggestion. I split the code accordingly. bq. TimelineAuthenticationFilterInitializer AuthenticationFilterInitializer has a single method to do everything, and the prefix is a static variable, which makes me a bit difficult to override part of code without changing AuthenticationFilterInitializer. One another issue is that AuthenticationFilterInitializer requires user to supply a secret file, which is not actually required by AuthenticationFilter (HADOOP-10600). > Delegation token stuff for the timeline sever > - > > Key: YARN-2049 > URL: https://issues.apache.org/jira/browse/YARN-2049 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, > YARN-2049.4.patch, YARN-2049.5.patch, YARN-2049.6.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins
[ https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006548#comment-14006548 ] Mayank Bansal commented on YARN-1408: - I agree with [~jianhe] and [~devaraj.k] We should be able to preempt the container in ALLOCATED state. bq. oday the resource request is decremented when container is allocated. we may change it to decrement the resource request only when the container is pulled by the AM ? I am not sure if thats the right thing as you dont want to run into other race conditions when container is been allocated however capacity is given to some other AM's? > Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task > timeout for 30mins > -- > > Key: YARN-1408 > URL: https://issues.apache.org/jira/browse/YARN-1408 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, > Yarn-1408.4.patch, Yarn-1408.patch > > > Capacity preemption is enabled as follows. > * yarn.resourcemanager.scheduler.monitor.enable= true , > * > yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy > Queue = a,b > Capacity of Queue A = 80% > Capacity of Queue B = 20% > Step 1: Assign a big jobA on queue a which uses full cluster capacity > Step 2: Submitted a jobB to queue b which would use less than 20% of cluster > capacity > JobA task which uses queue b capcity is been preempted and killed. > This caused below problem: > 1. New Container has got allocated for jobA in Queue A as per node update > from an NM. > 2. This container has been preempted immediately as per preemption. > Here ACQUIRED at KILLED Invalid State exception came when the next AM > heartbeat reached RM. > ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ACQUIRED at KILLED > This also caused the Task to go for a timeout for 30minutes as this Container > was already killed by preemption. > attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
[ https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-1913: -- Attachment: YARN-1913.patch > With Fair Scheduler, cluster can logjam when all resources are consumed by AMs > -- > > Key: YARN-1913 > URL: https://issues.apache.org/jira/browse/YARN-1913 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.3.0 >Reporter: bc Wong >Assignee: Wei Yan > Attachments: YARN-1913.patch, YARN-1913.patch > > > It's possible to deadlock a cluster by submitting many applications at once, > and have all cluster resources taken up by AMs. > One solution is for the scheduler to limit resources taken up by AMs, as a > percentage of total cluster resources, via a "maxApplicationMasterShare" > config. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions
[ https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006609#comment-14006609 ] Sandy Ryza commented on YARN-596: - Ah, I see what you're saying. Good point. In that case we'll probably need to push that check into the SchedulingPolicy and call it inside the loop in preemptContainer(). > In fair scheduler, intra-application container priorities affect > inter-application preemption decisions > --- > > Key: YARN-596 > URL: https://issues.apache.org/jira/browse/YARN-596 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, > YARN-596.patch, YARN-596.patch > > > In the fair scheduler, containers are chosen for preemption in the following > way: > All containers for all apps that are in queues that are over their fair share > are put in a list. > The list is sorted in order of the priority that the container was requested > in. > This means that an application can shield itself from preemption by > requesting it's containers at higher priorities, which doesn't really make > sense. > Also, an application that is not over its fair share, but that is in a queue > that is over it's fair share is just as likely to have containers preempted > as an application that is over its fair share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins
[ https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006613#comment-14006613 ] Jian He commented on YARN-1408: --- Seems more problem with the approach I mentioned, if the request is not updated at the time container is allocated, and AM doesn't do the following allocate, more containers will be allocated from the same request when NM heartbeats > Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task > timeout for 30mins > -- > > Key: YARN-1408 > URL: https://issues.apache.org/jira/browse/YARN-1408 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, > Yarn-1408.4.patch, Yarn-1408.patch > > > Capacity preemption is enabled as follows. > * yarn.resourcemanager.scheduler.monitor.enable= true , > * > yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy > Queue = a,b > Capacity of Queue A = 80% > Capacity of Queue B = 20% > Step 1: Assign a big jobA on queue a which uses full cluster capacity > Step 2: Submitted a jobB to queue b which would use less than 20% of cluster > capacity > JobA task which uses queue b capcity is been preempted and killed. > This caused below problem: > 1. New Container has got allocated for jobA in Queue A as per node update > from an NM. > 2. This container has been preempted immediately as per preemption. > Here ACQUIRED at KILLED Invalid State exception came when the next AM > heartbeat reached RM. > ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ACQUIRED at KILLED > This also caused the Task to go for a timeout for 30minutes as this Container > was already killed by preemption. > attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions
[ https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006615#comment-14006615 ] Wei Yan commented on YARN-596: -- yes, we can check the queue's policy in the preCheck function. If DRF, we use Resources.fitsIn(); if Fair, we use DEFAULT_CALCULATOR. Sounds good? > In fair scheduler, intra-application container priorities affect > inter-application preemption decisions > --- > > Key: YARN-596 > URL: https://issues.apache.org/jira/browse/YARN-596 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, > YARN-596.patch, YARN-596.patch > > > In the fair scheduler, containers are chosen for preemption in the following > way: > All containers for all apps that are in queues that are over their fair share > are put in a list. > The list is sorted in order of the priority that the container was requested > in. > This means that an application can shield itself from preemption by > requesting it's containers at higher priorities, which doesn't really make > sense. > Also, an application that is not over its fair share, but that is in a queue > that is over it's fair share is just as likely to have containers preempted > as an application that is over its fair share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2012) Fair Scheduler : Default rule in queue placement policy can take a queue as an optional attribute
[ https://issues.apache.org/jira/browse/YARN-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated YARN-2012: - Description: Currently 'default' rule in queue placement policy,if applied,puts the app in root.default queue. It would be great if we can make 'default' rule optionally point to a different queue as default queue . This default queue can be a leaf queue or it can also be an parent queue if the 'default' rule is nested inside nestedUserQueue rule(YARN-1864). was: Currently 'default' rule in queue placement policy,if applied,puts the app in root.default queue. It would be great if we can make 'default' rule optionally point to a different queue as default queue . This queue should be an existing queue,if not we fall back to root.default queue hence keeping this rule as terminal. This default queue can be a leaf queue or it can also be an parent queue if the 'default' rule is nested inside nestedUserQueue rule(YARN-1864). > Fair Scheduler : Default rule in queue placement policy can take a queue as > an optional attribute > - > > Key: YARN-2012 > URL: https://issues.apache.org/jira/browse/YARN-2012 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Labels: scheduler > Attachments: YARN-2012-v1.txt, YARN-2012-v2.txt > > > Currently 'default' rule in queue placement policy,if applied,puts the app in > root.default queue. It would be great if we can make 'default' rule > optionally point to a different queue as default queue . > This default queue can be a leaf queue or it can also be an parent queue if > the 'default' rule is nested inside nestedUserQueue rule(YARN-1864). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions
[ https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006641#comment-14006641 ] Sandy Ryza commented on YARN-596: - Sounds good > In fair scheduler, intra-application container priorities affect > inter-application preemption decisions > --- > > Key: YARN-596 > URL: https://issues.apache.org/jira/browse/YARN-596 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, > YARN-596.patch, YARN-596.patch > > > In the fair scheduler, containers are chosen for preemption in the following > way: > All containers for all apps that are in queues that are over their fair share > are put in a list. > The list is sorted in order of the priority that the container was requested > in. > This means that an application can shield itself from preemption by > requesting it's containers at higher priorities, which doesn't really make > sense. > Also, an application that is not over its fair share, but that is in a queue > that is over it's fair share is just as likely to have containers preempted > as an application that is over its fair share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2049) Delegation token stuff for the timeline sever
[ https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006645#comment-14006645 ] Hadoop QA commented on YARN-2049: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646401/YARN-2049.6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/3789//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.yarn.client.TestRMAdminCLI {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3789//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3789//console This message is automatically generated. > Delegation token stuff for the timeline sever > - > > Key: YARN-2049 > URL: https://issues.apache.org/jira/browse/YARN-2049 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, > YARN-2049.4.patch, YARN-2049.5.patch, YARN-2049.6.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler
[ https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006646#comment-14006646 ] Wangda Tan commented on YARN-1368: -- [~jianhe], Thanks for addressing my comments, I've looked at your latest patch, only some minor comments 1) yarn_server_common_service_protos.proto: {code} + repeated ContainerRecoveryReportProto container_report = 6; {code} should be container_reports 2) AppSchedulingInfo.java: {code} +if (containerId >= containerIdCounter.get()) { + containerIdCounter.set(containerId); +} {code} Better to use compareAndSet in a while loop in case of race condition 3) It's better to add a test for ContainerRecoveryReport > Common work to re-populate containers’ state into scheduler > --- > > Key: YARN-1368 > URL: https://issues.apache.org/jira/browse/YARN-1368 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Jian He > Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, > YARN-1368.combined.001.patch, YARN-1368.preliminary.patch > > > YARN-1367 adds support for the NM to tell the RM about all currently running > containers upon registration. The RM needs to send this information to the > schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover > the current allocation state of the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions
[ https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-596: - Attachment: YARN-596.patch Update a new patch to solve Sandy's comments. > In fair scheduler, intra-application container priorities affect > inter-application preemption decisions > --- > > Key: YARN-596 > URL: https://issues.apache.org/jira/browse/YARN-596 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, > YARN-596.patch, YARN-596.patch, YARN-596.patch > > > In the fair scheduler, containers are chosen for preemption in the following > way: > All containers for all apps that are in queues that are over their fair share > are put in a list. > The list is sorted in order of the priority that the container was requested > in. > This means that an application can shield itself from preemption by > requesting it's containers at higher priorities, which doesn't really make > sense. > Also, an application that is not over its fair share, but that is in a queue > that is over it's fair share is just as likely to have containers preempted > as an application that is over its fair share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster
[ https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2073: --- Attachment: yarn-2073-3.patch Spoke to Sandy offline. We think there should be a utilization threshold after which preemption kicks in. The new patch is along those lines. > FairScheduler starts preempting resources even with free resources on the > cluster > - > > Key: YARN-2073 > URL: https://issues.apache.org/jira/browse/YARN-2073 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch, > yarn-2073-3.patch > > > Preemption should kick in only when the currently available slots don't match > the request. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions
[ https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006676#comment-14006676 ] Sandy Ryza commented on YARN-596: - The current patch uses the queue's policy to preemptContainerPreCheck. We should the parent's policy. (Consider the case of a leaf queue with FIFO under a parent queue with DRF - we should use DRF to decide whether we should skip the leaf queue). Also, we should add a new method to SchedulingPolicy instead of checking with instanceof. {code} + if (Resources.fitsIn(getResourceUsage(), getFairShare())) { +return false; + } else { +return true; + } {code} Can just use "return Resources.fitsIn(getResourceUsage(), getFairShare())". > In fair scheduler, intra-application container priorities affect > inter-application preemption decisions > --- > > Key: YARN-596 > URL: https://issues.apache.org/jira/browse/YARN-596 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, > YARN-596.patch, YARN-596.patch, YARN-596.patch > > > In the fair scheduler, containers are chosen for preemption in the following > way: > All containers for all apps that are in queues that are over their fair share > are put in a list. > The list is sorted in order of the priority that the container was requested > in. > This means that an application can shield itself from preemption by > requesting it's containers at higher priorities, which doesn't really make > sense. > Also, an application that is not over its fair share, but that is in a queue > that is over it's fair share is just as likely to have containers preempted > as an application that is over its fair share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster
[ https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006682#comment-14006682 ] Sandy Ryza commented on YARN-2073: -- {code} + /** Preemption related variables */ {code} Nit: use "//" like the other comments. Can you add the new property in the Fair Scheduler doc? {code} + updateRootQueueMetrics(); {code} My understanding is that this shouldn't be needed in shouldAttemptPreemption. Have you observed otherwise? Would it be possible to move the TestFairScheduler refactoring to a separate JIRA? If it's too difficult to entangle at this point, I'm ok with it. > FairScheduler starts preempting resources even with free resources on the > cluster > - > > Key: YARN-2073 > URL: https://issues.apache.org/jira/browse/YARN-2073 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch, > yarn-2073-3.patch > > > Preemption should kick in only when the currently available slots don't match > the request. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-2095) Large MapReduce Job stops responding
[ https://issues.apache.org/jira/browse/YARN-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-2095. --- Resolution: Invalid [~sunliners81], we have run much bigger jobs (100K maps) and those that run for long time without any issues. There is only one limitation that I know of - in secure clusters tokens expire after 7 days. In any case, please pursue this on user mailing lists and create a bug when you are sure there is one. Closing this as invalid for now, please reopen if you disagree. > Large MapReduce Job stops responding > > > Key: YARN-2095 > URL: https://issues.apache.org/jira/browse/YARN-2095 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0 > Environment: CentOS 6.3 (x86_64) on vmware 10 running HDP-2.0.6 >Reporter: Clay McDonald >Priority: Blocker > > Very large jobs (7,455 Mappers and 999 Reducers) hang. Jobs run well but > logging to container logs stop after running 33 hours. The job appears to be > hung. The status of the job is "RUNNING". No error messages found in logs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2049) Delegation token stuff for the timeline sever
[ https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2049: -- Attachment: YARN-2049.7.patch Fix the javadoc warnings, the test failure is not related. See YARN-2075. > Delegation token stuff for the timeline sever > - > > Key: YARN-2049 > URL: https://issues.apache.org/jira/browse/YARN-2049 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, > YARN-2049.4.patch, YARN-2049.5.patch, YARN-2049.6.patch, YARN-2049.7.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster
[ https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006705#comment-14006705 ] Hadoop QA commented on YARN-2073: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646425/yarn-2073-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3790//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3790//console This message is automatically generated. > FairScheduler starts preempting resources even with free resources on the > cluster > - > > Key: YARN-2073 > URL: https://issues.apache.org/jira/browse/YARN-2073 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch, > yarn-2073-3.patch > > > Preemption should kick in only when the currently available slots don't match > the request. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures
[ https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006735#comment-14006735 ] Mayank Bansal commented on YARN-2074: - +1 LGTM Thanks, Mayank > Preemption of AM containers shouldn't count towards AM failures > --- > > Key: YARN-2074 > URL: https://issues.apache.org/jira/browse/YARN-2074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Vinod Kumar Vavilapalli >Assignee: Jian He > Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch > > > One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM > containers getting preempted shouldn't count towards AM failures and thus > shouldn't eventually fail applications. > We should explicitly handle AM container preemption/kill as a separate issue > and not count it towards the limit on AM failures. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures
[ https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006743#comment-14006743 ] Jian He commented on YARN-2074: --- Talked with Vinod offline, the big problem with this is even if we don't count AM preemption towards AM failures on RM side, MR AM itself checks the attempt id against the max-attempt count for recovery. Work around is to reset the MAX-ATTEMPT env each time launching the AM which sounds a bit hacky though. > Preemption of AM containers shouldn't count towards AM failures > --- > > Key: YARN-2074 > URL: https://issues.apache.org/jira/browse/YARN-2074 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Vinod Kumar Vavilapalli >Assignee: Jian He > Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch > > > One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM > containers getting preempted shouldn't count towards AM failures and thus > shouldn't eventually fail applications. > We should explicitly handle AM container preemption/kill as a separate issue > and not count it towards the limit on AM failures. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.
[ https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2024: -- Priority: Critical (was: Major) Target Version/s: 2.4.1 > IOException in AppLogAggregatorImpl does not give stacktrace and leaves > aggregated TFile in a bad state. > > > Key: YARN-2024 > URL: https://issues.apache.org/jira/browse/YARN-2024 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 0.23.10, 2.4.0 >Reporter: Eric Payne >Priority: Critical > > Multiple issues were encountered when AppLogAggregatorImpl encountered an > IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating > yarn-logs for an application that had very large (>150G each) error logs. > - An IOException was encountered during the LogWriter#append call, and a > message was printed, but no stacktrace was provided. Message: "ERROR: > Couldn't upload logs for container_n_nnn_nn_nn. Skipping > this container." > - After the IOExceptin, the TFile is in a bad state, so subsequent calls to > LogWriter#append fail with the following stacktrace: > 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[LogAggregationService #17907,5,main] threw an Exception. > java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE > at > org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:262) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:164) > ... > - At this point, the yarn-logs cleaner still thinks the thread is > aggregating, so the huge yarn-logs never get cleaned up for that application. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.
[ https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2024: -- Component/s: log-aggregation > IOException in AppLogAggregatorImpl does not give stacktrace and leaves > aggregated TFile in a bad state. > > > Key: YARN-2024 > URL: https://issues.apache.org/jira/browse/YARN-2024 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 0.23.10, 2.4.0 >Reporter: Eric Payne >Priority: Critical > > Multiple issues were encountered when AppLogAggregatorImpl encountered an > IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating > yarn-logs for an application that had very large (>150G each) error logs. > - An IOException was encountered during the LogWriter#append call, and a > message was printed, but no stacktrace was provided. Message: "ERROR: > Couldn't upload logs for container_n_nnn_nn_nn. Skipping > this container." > - After the IOExceptin, the TFile is in a bad state, so subsequent calls to > LogWriter#append fail with the following stacktrace: > 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[LogAggregationService #17907,5,main] threw an Exception. > java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE > at > org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:262) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:164) > ... > - At this point, the yarn-logs cleaner still thinks the thread is > aggregating, so the huge yarn-logs never get cleaned up for that application. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.
[ https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2024: -- Issue Type: Sub-task (was: Bug) Parent: YARN-431 > IOException in AppLogAggregatorImpl does not give stacktrace and leaves > aggregated TFile in a bad state. > > > Key: YARN-2024 > URL: https://issues.apache.org/jira/browse/YARN-2024 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation >Affects Versions: 0.23.10, 2.4.0 >Reporter: Eric Payne >Priority: Critical > > Multiple issues were encountered when AppLogAggregatorImpl encountered an > IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating > yarn-logs for an application that had very large (>150G each) error logs. > - An IOException was encountered during the LogWriter#append call, and a > message was printed, but no stacktrace was provided. Message: "ERROR: > Couldn't upload logs for container_n_nnn_nn_nn. Skipping > this container." > - After the IOExceptin, the TFile is in a bad state, so subsequent calls to > LogWriter#append fail with the following stacktrace: > 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[LogAggregationService #17907,5,main] threw an Exception. > java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE > at > org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:262) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:164) > ... > - At this point, the yarn-logs cleaner still thinks the thread is > aggregating, so the huge yarn-logs never get cleaned up for that application. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2082) Support for alternative log aggregation mechanism
[ https://issues.apache.org/jira/browse/YARN-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006753#comment-14006753 ] Vinod Kumar Vavilapalli commented on YARN-2082: --- We should also consider some scalable solutions on HDFS itself - to post process the logs automatically to reduce the file-count and may be NMs forming a tree of aggregation (with network copy of logs) before hitting HDFS. IAC, the pluggability is sort of a dup of the proposal at YARN-1440 (albeit for a different reason)? > Support for alternative log aggregation mechanism > - > > Key: YARN-2082 > URL: https://issues.apache.org/jira/browse/YARN-2082 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Ming Ma > > I will post a more detailed design later. Here is the brief summary and would > like to get early feedback. > Problem Statement: > Current implementation of log aggregation create one HDFS file for each > {application, nodemanager }. These files are relative small, in the range of > 1-2 MB. In a large cluster with lots of application and many nodemanagers, it > ends up creating lots of small files in HDFS. This creates pressure on HDFS > NN on the following ways. > 1. It increases NN Memory size. It is mitigated by having history server > deletes old log files in HDFS. > 2. Runtime RPC hit on HDFS. Each log aggregation file introduced several NN > RPCs such as create, getAdditionalBlock, complete, rename. When the cluster > is busy, such RPC hit has impact on NN performance. > In addition, to support non-MR applications on YARN, we might need to support > aggregation for long running applications. > Design choices: > 1. Don't aggregate all the logs, as in YARN-221. > 2. Create a dedicated HDFS namespace used only for log aggregation. > 3. Write logs to some key-value store like HBase. HBase's RPC hit on NN will > be much less. > 4. Decentralize the application level log aggregation to NMs. All logs for a > given application are aggregated first by a dedicated NM before it is pushed > to HDFS. > 5. Have NM aggregate logs on a regular basis; each of these log files will > have data from different applications and there needs to be some index for > quick lookup. > Proposal: > 1. Make yarn log aggregation pluggable for both read and write path. Note > that Hadoop FileSystem provides an abstraction and we could ask alternative > log aggregator implement compatable FileSystem, but that seems to an overkill. > 2. Provide a log aggregation plugin that write to HBase. The scheme design > needs to support efficient read on a per application as well as per > application+container basis; in addition, it shouldn't create hotspot in a > cluster where certain users might create more jobs than others. For example, > we can use hash($user+$applicationId} + containerid as the row key. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1545) [Umbrella] Prevent DoS of YARN components by putting in limits
[ https://issues.apache.org/jira/browse/YARN-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006754#comment-14006754 ] Hong Zhiguo commented on YARN-1545: --- You mean we should define the upper bound of the number or length of fields inside the messages. Should we have these bounds configurable? or pre-defined as constants? How about the rate of messages? For example, a bad client performs query of getApplications at it's full speed. > [Umbrella] Prevent DoS of YARN components by putting in limits > -- > > Key: YARN-1545 > URL: https://issues.apache.org/jira/browse/YARN-1545 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Vinod Kumar Vavilapalli > > I did a pass and found many places that can cause DoS on various YARN > services. Need to fix them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster
[ https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2073: --- Attachment: yarn-2073-4.patch Thanks for the review, Sandy. Updated the patch to reflect your suggestions except the Test refactoring. For the tests, it was easier to split and I think it is the right direction forward. If you don't mind, I would like to leave the patch as is. > FairScheduler starts preempting resources even with free resources on the > cluster > - > > Key: YARN-2073 > URL: https://issues.apache.org/jira/browse/YARN-2073 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch, > yarn-2073-3.patch, yarn-2073-4.patch > > > Preemption should kick in only when the currently available slots don't match > the request. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2049) Delegation token stuff for the timeline sever
[ https://issues.apache.org/jira/browse/YARN-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006774#comment-14006774 ] Hadoop QA commented on YARN-2049: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646431/YARN-2049.7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.yarn.client.TestRMAdminCLI {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3791//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3791//console This message is automatically generated. > Delegation token stuff for the timeline sever > - > > Key: YARN-2049 > URL: https://issues.apache.org/jira/browse/YARN-2049 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2049.1.patch, YARN-2049.2.patch, YARN-2049.3.patch, > YARN-2049.4.patch, YARN-2049.5.patch, YARN-2049.6.patch, YARN-2049.7.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1936) Secured timeline client
[ https://issues.apache.org/jira/browse/YARN-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006776#comment-14006776 ] Zhijie Shen commented on YARN-1936: --- Vinod, thanks for review. See my response bellow: bq. Make the event-put as one of the options "-put" Good point. I make use of CommandLine to do simple CLI. bq. Add delegation token only if timeline-service is enabled. Added the check bq. Also move this main to TimelineClientImpl moved bq. selectToken() can use a TimelineDelegationTokenSelector to find the token? Use selector instead, and do some refactoring required. bq. Can we add a simple test to validate the addition of the Delegation Token to the client credentials? Added a test case > Secured timeline client > --- > > Key: YARN-1936 > URL: https://issues.apache.org/jira/browse/YARN-1936 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-1936.1.patch, YARN-1936.2.patch, YARN-1936.3.patch > > > TimelineClient should be able to talk to the timeline server with kerberos > authentication or delegation token -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1936) Secured timeline client
[ https://issues.apache.org/jira/browse/YARN-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1936: -- Attachment: YARN-1936.3.patch > Secured timeline client > --- > > Key: YARN-1936 > URL: https://issues.apache.org/jira/browse/YARN-1936 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-1936.1.patch, YARN-1936.2.patch, YARN-1936.3.patch > > > TimelineClient should be able to talk to the timeline server with kerberos > authentication or delegation token -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1545) [Umbrella] Prevent DoS of YARN components by putting in limits
[ https://issues.apache.org/jira/browse/YARN-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006777#comment-14006777 ] Vinod Kumar Vavilapalli commented on YARN-1545: --- I covered the details on the individual tickets - it's mostly about bounding buffers, lists etc. When I filed this I was only focusing on application level stuff. A bad client firing off RPCs in rapid fire can and should be addressed at in the RPC layer itself IMO. > [Umbrella] Prevent DoS of YARN components by putting in limits > -- > > Key: YARN-1545 > URL: https://issues.apache.org/jira/browse/YARN-1545 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Vinod Kumar Vavilapalli > > I did a pass and found many places that can cause DoS on various YARN > services. Need to fix them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1474) Make schedulers services
[ https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006778#comment-14006778 ] Tsuyoshi OZAWA commented on YARN-1474: -- [~kkambatl], could you kick the Jenkins and check the latest patch? > Make schedulers services > > > Key: YARN-1474 > URL: https://issues.apache.org/jira/browse/YARN-1474 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Affects Versions: 2.3.0, 2.4.0 >Reporter: Sandy Ryza >Assignee: Tsuyoshi OZAWA > Attachments: YARN-1474.1.patch, YARN-1474.10.patch, > YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, > YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.16.patch, > YARN-1474.2.patch, YARN-1474.3.patch, YARN-1474.4.patch, YARN-1474.5.patch, > YARN-1474.6.patch, YARN-1474.7.patch, YARN-1474.8.patch, YARN-1474.9.patch > > > Schedulers currently have a reinitialize but no start and stop. Fitting them > into the YARN service model would make things more coherent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2073) FairScheduler starts preempting resources even with free resources on the cluster
[ https://issues.apache.org/jira/browse/YARN-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006781#comment-14006781 ] Hadoop QA commented on YARN-2073: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646441/yarn-2073-4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3792//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3792//console This message is automatically generated. > FairScheduler starts preempting resources even with free resources on the > cluster > - > > Key: YARN-2073 > URL: https://issues.apache.org/jira/browse/YARN-2073 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.4.0 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Critical > Attachments: yarn-2073-0.patch, yarn-2073-1.patch, yarn-2073-2.patch, > yarn-2073-3.patch, yarn-2073-4.patch > > > Preemption should kick in only when the currently available slots don't match > the request. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2081) TestDistributedShell fails after YARN-1962
[ https://issues.apache.org/jira/browse/YARN-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006793#comment-14006793 ] Hudson commented on YARN-2081: -- FAILURE: Integrated in Hadoop-Yarn-trunk #563 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/563/]) YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by Zhiguo Hong. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596724) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java > TestDistributedShell fails after YARN-1962 > -- > > Key: YARN-2081 > URL: https://issues.apache.org/jira/browse/YARN-2081 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Affects Versions: 3.0.0, 2.4.1 >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > Fix For: 2.4.1 > > Attachments: YARN-2081.patch > > > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1938) Kerberos authentication for the timeline server
[ https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006801#comment-14006801 ] Hudson commented on YARN-1938: -- FAILURE: Integrated in Hadoop-Yarn-trunk #563 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/563/]) YARN-1938. Added kerberos login for the Timeline Server. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596710) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java > Kerberos authentication for the timeline server > --- > > Key: YARN-1938 > URL: https://issues.apache.org/jira/browse/YARN-1938 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.5.0 > > Attachments: YARN-1938.1.patch, YARN-1938.2.patch, YARN-1938.3.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2089) FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations
[ https://issues.apache.org/jira/browse/YARN-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006796#comment-14006796 ] Hudson commented on YARN-2089: -- FAILURE: Integrated in Hadoop-Yarn-trunk #563 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/563/]) YARN-2089. FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations. (Zhihai Xu via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596765) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java > FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing > audience annotations > --- > > Key: YARN-2089 > URL: https://issues.apache.org/jira/browse/YARN-2089 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.4.0 >Reporter: Anubhav Dhoot >Assignee: zhihai xu > Labels: newbie > Fix For: 2.5.0 > > Attachments: yarn-2089.patch > > > We should mark QueuePlacementPolicy and QueuePlacementRule with audience > annotations @Private @Unstable -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers
[ https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006797#comment-14006797 ] Hudson commented on YARN-2017: -- FAILURE: Integrated in Hadoop-Yarn-trunk #563 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/563/]) YARN-2017. Merged some of the common scheduler code. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596753) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * /hadoop/common/trunk/hadoop
[jira] [Commented] (YARN-1962) Timeline server is enabled by default
[ https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006798#comment-14006798 ] Hudson commented on YARN-1962: -- FAILURE: Integrated in Hadoop-Yarn-trunk #563 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/563/]) YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by Zhiguo Hong. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596724) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java > Timeline server is enabled by default > - > > Key: YARN-1962 > URL: https://issues.apache.org/jira/browse/YARN-1962 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.0 >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Fix For: 2.4.1 > > Attachments: YARN-1962.1.patch, YARN-1962.2.patch > > > Since Timeline server is not matured and secured yet, enabling it by default > might create some confusion. > We were playing with 2.4.0 and found a lot of exceptions for distributed > shell example related to connection refused error. Btw, we didn't run TS > because it is not secured yet. > Although it is possible to explicitly turn it off through yarn-site config. > In my opinion, this extra change for this new service is not worthy at this > point,. > This JIRA is to turn it off by default. > If there is an agreement, i can put a simple patch about this. > {noformat} > 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response > from the timeline server. > com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: > Connection refused > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) > Caused by: java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:579) > at java.net.Socket.connect(Socket.java:528) > at sun.net.NetworkClient.doConnect(NetworkClient.java:180) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) > at sun.net.www.http.HttpClient. impl.TimelineClientImpl: Failed to get the response from the timeline server. > com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: > Connection refused > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMa
[jira] [Commented] (YARN-2050) Fix LogCLIHelpers to create the correct FileContext
[ https://issues.apache.org/jira/browse/YARN-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006794#comment-14006794 ] Hudson commented on YARN-2050: -- FAILURE: Integrated in Hadoop-Yarn-trunk #563 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/563/]) YARN-2050. Fix LogCLIHelpers to create the correct FileContext. Contributed by Ming Ma (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596310) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java > Fix LogCLIHelpers to create the correct FileContext > --- > > Key: YARN-2050 > URL: https://issues.apache.org/jira/browse/YARN-2050 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 3.0.0, 2.5.0 > > Attachments: YARN-2050-2.patch, YARN-2050.patch > > > LogCLIHelpers calls FileContext.getFileContext() without any parameters. Thus > the FileContext created isn't necessarily the FileContext for remote log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2088) Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder
[ https://issues.apache.org/jira/browse/YARN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006845#comment-14006845 ] Hadoop QA commented on YARN-2088: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646030/YARN-2088.v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3794//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3794//console This message is automatically generated. > Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder > > > Key: YARN-2088 > URL: https://issues.apache.org/jira/browse/YARN-2088 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Binglin Chang >Assignee: Binglin Chang > Attachments: YARN-2088.v1.patch > > > Some fields(set,list) are added to proto builders many times, we need to > clear those fields before add, otherwise the result proto contains more > contents. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2030) Use StateMachine to simplify handleStoreEvent() in RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006858#comment-14006858 ] Hadoop QA commented on YARN-2030: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12645932/YARN-2030.v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3793//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3793//console This message is automatically generated. > Use StateMachine to simplify handleStoreEvent() in RMStateStore > --- > > Key: YARN-2030 > URL: https://issues.apache.org/jira/browse/YARN-2030 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Junping Du >Assignee: Binglin Chang > Attachments: YARN-2030.v1.patch, YARN-2030.v2.patch > > > Now the logic to handle different store events in handleStoreEvent() is as > following: > {code} > if (event.getType().equals(RMStateStoreEventType.STORE_APP) > || event.getType().equals(RMStateStoreEventType.UPDATE_APP)) { > ... > if (event.getType().equals(RMStateStoreEventType.STORE_APP)) { > ... > } else { > ... > } > ... > try { > if (event.getType().equals(RMStateStoreEventType.STORE_APP)) { > ... > } else { > ... > } > } > ... > } else if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT) > || event.getType().equals(RMStateStoreEventType.UPDATE_APP_ATTEMPT)) { > ... > if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) { > ... > } else { > ... > } > ... > if (event.getType().equals(RMStateStoreEventType.STORE_APP_ATTEMPT)) { > ... > } else { > ... > } > } > ... > } else if (event.getType().equals(RMStateStoreEventType.REMOVE_APP)) { > ... > } else { > ... > } > } > {code} > This is not only confuse people but also led to mistake easily. We may > leverage state machine to simply this even no state transitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1962) Timeline server is enabled by default
[ https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006871#comment-14006871 ] Hudson commented on YARN-1962: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/]) YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by Zhiguo Hong. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596724) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java > Timeline server is enabled by default > - > > Key: YARN-1962 > URL: https://issues.apache.org/jira/browse/YARN-1962 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.0 >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Fix For: 2.4.1 > > Attachments: YARN-1962.1.patch, YARN-1962.2.patch > > > Since Timeline server is not matured and secured yet, enabling it by default > might create some confusion. > We were playing with 2.4.0 and found a lot of exceptions for distributed > shell example related to connection refused error. Btw, we didn't run TS > because it is not secured yet. > Although it is possible to explicitly turn it off through yarn-site config. > In my opinion, this extra change for this new service is not worthy at this > point,. > This JIRA is to turn it off by default. > If there is an agreement, i can put a simple patch about this. > {noformat} > 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response > from the timeline server. > com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: > Connection refused > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) > Caused by: java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:579) > at java.net.Socket.connect(Socket.java:528) > at sun.net.NetworkClient.doConnect(NetworkClient.java:180) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) > at sun.net.www.http.HttpClient. impl.TimelineClientImpl: Failed to get the response from the timeline server. > com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: > Connection refused > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) > at > org.apache.hadoop.yarn.applications.distributedshell.A
[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers
[ https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006870#comment-14006870 ] Hudson commented on YARN-2017: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/]) YARN-2017. Merged some of the common scheduler code. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596753) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * /hadoop/common/
[jira] [Commented] (YARN-2081) TestDistributedShell fails after YARN-1962
[ https://issues.apache.org/jira/browse/YARN-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006866#comment-14006866 ] Hudson commented on YARN-2081: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/]) YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by Zhiguo Hong. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596724) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java > TestDistributedShell fails after YARN-1962 > -- > > Key: YARN-2081 > URL: https://issues.apache.org/jira/browse/YARN-2081 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Affects Versions: 3.0.0, 2.4.1 >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > Fix For: 2.4.1 > > Attachments: YARN-2081.patch > > > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1938) Kerberos authentication for the timeline server
[ https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006874#comment-14006874 ] Hudson commented on YARN-1938: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/]) YARN-1938. Added kerberos login for the Timeline Server. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596710) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java > Kerberos authentication for the timeline server > --- > > Key: YARN-1938 > URL: https://issues.apache.org/jira/browse/YARN-1938 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.5.0 > > Attachments: YARN-1938.1.patch, YARN-1938.2.patch, YARN-1938.3.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2050) Fix LogCLIHelpers to create the correct FileContext
[ https://issues.apache.org/jira/browse/YARN-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006867#comment-14006867 ] Hudson commented on YARN-2050: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/]) YARN-2050. Fix LogCLIHelpers to create the correct FileContext. Contributed by Ming Ma (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596310) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java > Fix LogCLIHelpers to create the correct FileContext > --- > > Key: YARN-2050 > URL: https://issues.apache.org/jira/browse/YARN-2050 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 3.0.0, 2.5.0 > > Attachments: YARN-2050-2.patch, YARN-2050.patch > > > LogCLIHelpers calls FileContext.getFileContext() without any parameters. Thus > the FileContext created isn't necessarily the FileContext for remote log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2089) FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations
[ https://issues.apache.org/jira/browse/YARN-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006869#comment-14006869 ] Hudson commented on YARN-2089: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/]) YARN-2089. FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations. (Zhihai Xu via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596765) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java > FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing > audience annotations > --- > > Key: YARN-2089 > URL: https://issues.apache.org/jira/browse/YARN-2089 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.4.0 >Reporter: Anubhav Dhoot >Assignee: zhihai xu > Labels: newbie > Fix For: 2.5.0 > > Attachments: yarn-2089.patch > > > We should mark QueuePlacementPolicy and QueuePlacementRule with audience > annotations @Private @Unstable -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1962) Timeline server is enabled by default
[ https://issues.apache.org/jira/browse/YARN-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006893#comment-14006893 ] Hudson commented on YARN-1962: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/]) YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by Zhiguo Hong. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596724) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java > Timeline server is enabled by default > - > > Key: YARN-1962 > URL: https://issues.apache.org/jira/browse/YARN-1962 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.0 >Reporter: Mohammad Kamrul Islam >Assignee: Mohammad Kamrul Islam > Fix For: 2.4.1 > > Attachments: YARN-1962.1.patch, YARN-1962.2.patch > > > Since Timeline server is not matured and secured yet, enabling it by default > might create some confusion. > We were playing with 2.4.0 and found a lot of exceptions for distributed > shell example related to connection refused error. Btw, we didn't run TS > because it is not secured yet. > Although it is possible to explicitly turn it off through yarn-site config. > In my opinion, this extra change for this new service is not worthy at this > point,. > This JIRA is to turn it off by default. > If there is an agreement, i can put a simple patch about this. > {noformat} > 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response > from the timeline server. > com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: > Connection refused > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) > Caused by: java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:579) > at java.net.Socket.connect(Socket.java:528) > at sun.net.NetworkClient.doConnect(NetworkClient.java:180) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) > at sun.net.www.http.HttpClient. impl.TimelineClientImpl: Failed to get the response from the timeline server. > com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: > Connection refused > at > com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) > at com.sun.jersey.api.client.Client.handle(Client.java:648) > at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) > at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) > at > com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) > at > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) > at > org.apache.hadoop.yarn.applications.distributedshell.Application
[jira] [Commented] (YARN-2089) FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations
[ https://issues.apache.org/jira/browse/YARN-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006891#comment-14006891 ] Hudson commented on YARN-2089: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/]) YARN-2089. FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing audience annotations. (Zhihai Xu via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596765) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueuePlacementRule.java > FairScheduler: QueuePlacementPolicy and QueuePlacementRule are missing > audience annotations > --- > > Key: YARN-2089 > URL: https://issues.apache.org/jira/browse/YARN-2089 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.4.0 >Reporter: Anubhav Dhoot >Assignee: zhihai xu > Labels: newbie > Fix For: 2.5.0 > > Attachments: yarn-2089.patch > > > We should mark QueuePlacementPolicy and QueuePlacementRule with audience > annotations @Private @Unstable -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2081) TestDistributedShell fails after YARN-1962
[ https://issues.apache.org/jira/browse/YARN-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006888#comment-14006888 ] Hudson commented on YARN-2081: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/]) YARN-2081. Fixed TestDistributedShell failure after YARN-1962. Contributed by Zhiguo Hong. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596724) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java > TestDistributedShell fails after YARN-1962 > -- > > Key: YARN-2081 > URL: https://issues.apache.org/jira/browse/YARN-2081 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Affects Versions: 3.0.0, 2.4.1 >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > Fix For: 2.4.1 > > Attachments: YARN-2081.patch > > > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2050) Fix LogCLIHelpers to create the correct FileContext
[ https://issues.apache.org/jira/browse/YARN-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006889#comment-14006889 ] Hudson commented on YARN-2050: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/]) YARN-2050. Fix LogCLIHelpers to create the correct FileContext. Contributed by Ming Ma (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596310) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java > Fix LogCLIHelpers to create the correct FileContext > --- > > Key: YARN-2050 > URL: https://issues.apache.org/jira/browse/YARN-2050 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 3.0.0, 2.5.0 > > Attachments: YARN-2050-2.patch, YARN-2050.patch > > > LogCLIHelpers calls FileContext.getFileContext() without any parameters. Thus > the FileContext created isn't necessarily the FileContext for remote log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1938) Kerberos authentication for the timeline server
[ https://issues.apache.org/jira/browse/YARN-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006896#comment-14006896 ] Hudson commented on YARN-1938: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/]) YARN-1938. Added kerberos login for the Timeline Server. Contributed by Zhijie Shen. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596710) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java > Kerberos authentication for the timeline server > --- > > Key: YARN-1938 > URL: https://issues.apache.org/jira/browse/YARN-1938 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.5.0 > > Attachments: YARN-1938.1.patch, YARN-1938.2.patch, YARN-1938.3.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2017) Merge some of the common lib code in schedulers
[ https://issues.apache.org/jira/browse/YARN-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006892#comment-14006892 ] Hudson commented on YARN-2017: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/]) YARN-2017. Merged some of the common scheduler code. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1596753) * /hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplication.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerContext.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * /hadoop/common/trunk/hado
[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins
[ https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006908#comment-14006908 ] Sunil G commented on YARN-1408: --- bq. we may change it to decrement the resource request only when the container is pulled by the AM ? As [~jianhe] mentioned, this can create problem with subsequent NM heartbeats. Also I agree that the container in ALLOCATED state is the best place to do preemption, but this raise condition can come there. CapacityScheduler raises KILL event for RMContainer(for preemption). So a solution may be like recreate resource request back, if the RMContainer state is ALLOCATED/ACQUIRED here. > Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task > timeout for 30mins > -- > > Key: YARN-1408 > URL: https://issues.apache.org/jira/browse/YARN-1408 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, > Yarn-1408.4.patch, Yarn-1408.patch > > > Capacity preemption is enabled as follows. > * yarn.resourcemanager.scheduler.monitor.enable= true , > * > yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy > Queue = a,b > Capacity of Queue A = 80% > Capacity of Queue B = 20% > Step 1: Assign a big jobA on queue a which uses full cluster capacity > Step 2: Submitted a jobB to queue b which would use less than 20% of cluster > capacity > JobA task which uses queue b capcity is been preempted and killed. > This caused below problem: > 1. New Container has got allocated for jobA in Queue A as per node update > from an NM. > 2. This container has been preempted immediately as per preemption. > Here ACQUIRED at KILLED Invalid State exception came when the next AM > heartbeat reached RM. > ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ACQUIRED at KILLED > This also caused the Task to go for a timeout for 30minutes as this Container > was already killed by preemption. > attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2096) testQueueMetricsOnRMRestart has race condition
Anubhav Dhoot created YARN-2096: --- Summary: testQueueMetricsOnRMRestart has race condition Key: YARN-2096 URL: https://issues.apache.org/jira/browse/YARN-2096 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart fails randomly because of a race condition. The test validates that metrics are incremented, but does not wait for all transitions to finish before checking for the values. It also resets metrics after kicking off recovery of second RM. The metrics that need to be incremented race with this reset causing test to fail randomly. We need to wait for the right transitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2096) testQueueMetricsOnRMRestart has race condition
[ https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-2096: --- Assignee: Anubhav Dhoot > testQueueMetricsOnRMRestart has race condition > -- > > Key: YARN-2096 > URL: https://issues.apache.org/jira/browse/YARN-2096 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart > fails randomly because of a race condition. > The test validates that metrics are incremented, but does not wait for all > transitions to finish before checking for the values. > It also resets metrics after kicking off recovery of second RM. The metrics > that need to be incremented race with this reset causing test to fail > randomly. > We need to wait for the right transitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2096) testQueueMetricsOnRMRestart has race condition
[ https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2096: Attachment: YARN-2096.patch Fixed 2 race conditions by First one) waiting for appropriate transitions before checking metrics and Second one) resetting metrics before the events are triggered. > testQueueMetricsOnRMRestart has race condition > -- > > Key: YARN-2096 > URL: https://issues.apache.org/jira/browse/YARN-2096 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2096.patch > > > org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart > fails randomly because of a race condition. > The test validates that metrics are incremented, but does not wait for all > transitions to finish before checking for the values. > It also resets metrics after kicking off recovery of second RM. The metrics > that need to be incremented race with this reset causing test to fail > randomly. > We need to wait for the right transitions. -- This message was sent by Atlassian JIRA (v6.2#6252)