[jira] [Updated] (YARN-8739) Fix jenkins issues for Node Attributes branch
[ https://issues.apache.org/jira/browse/YARN-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-8739: - Attachment: YARN-8739-YARN-3409.002.patch > Fix jenkins issues for Node Attributes branch > - > > Key: YARN-8739 > URL: https://issues.apache.org/jira/browse/YARN-8739 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-8739-YARN-3409.001.patch, > YARN-8739-YARN-3409.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8739) Fix jenkins issues for Node Attributes branch
[ https://issues.apache.org/jira/browse/YARN-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16600427#comment-16600427 ] Sunil Govindan commented on YARN-8739: -- Thanks [~bibinchundatt]. Updating v2 patch. > Fix jenkins issues for Node Attributes branch > - > > Key: YARN-8739 > URL: https://issues.apache.org/jira/browse/YARN-8739 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-8739-YARN-3409.001.patch, > YARN-8739-YARN-3409.002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6972) Adding RM ClusterId in AppInfo
[ https://issues.apache.org/jira/browse/YARN-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599858#comment-16599858 ] Hadoop QA commented on YARN-6972: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 42s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 49s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 7s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 46s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}134m 51s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-6972 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12938016/YARN-6972.016.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8142647747c5 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6edf3d2 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/21738/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/21738/testReport/ | | Max. process+thread count | 926 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-projec
[jira] [Updated] (YARN-4677) RMNodeResourceUpdateEvent update from scheduler can lead to race condition
[ https://issues.apache.org/jira/browse/YARN-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-4677: --- Fix Version/s: 3.0.4 > RMNodeResourceUpdateEvent update from scheduler can lead to race condition > -- > > Key: YARN-4677 > URL: https://issues.apache.org/jira/browse/YARN-4677 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, resourcemanager, scheduler >Affects Versions: 2.7.1 >Reporter: Brook Zhou >Assignee: Wilfred Spiegelenburg >Priority: Major > Fix For: 3.2.0, 3.1.1, 2.9.2, 3.0.4 > > Attachments: YARN-4677-branch-2.001.patch, > YARN-4677-branch-2.002.patch, YARN-4677-branch-2.003.patch, YARN-4677.01.patch > > > When a node is in decommissioning state, there is time window between > completedContainer() and RMNodeResourceUpdateEvent get handled in > scheduler.nodeUpdate (YARN-3223). > So if a scheduling effort happens within this window, the new container could > still get allocated on this node. Even worse case is if scheduling effort > happen after RMNodeResourceUpdateEvent sent out but before it is propagated > to SchedulerNode - then the total resource is lower than used resource and > available resource is a negative value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8382) cgroup file leak in NM
[ https://issues.apache.org/jira/browse/YARN-8382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-8382: --- Fix Version/s: 3.0.4 > cgroup file leak in NM > -- > > Key: YARN-8382 > URL: https://issues.apache.org/jira/browse/YARN-8382 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: we write an container with a shutdownHook which has a > piece of code like "while(true) sleep(100)" . when > *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms <* > *yarn.nodemanager.sleep-delay-before-sigkill.ms , cgourp file leak happens; > when* *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms >* > ** *yarn.nodemanager.sleep-delay-before-sigkill.ms, cgroup file is deleted > successfully*** >Reporter: Hu Ziqian >Assignee: Hu Ziqian >Priority: Major > Fix For: 3.2.0, 3.1.1, 3.0.4 > > Attachments: YARN-8382-branch-2.8.3.001.patch, > YARN-8382-branch-2.8.3.002.patch, YARN-8382.001.patch, YARN-8382.002.patch > > > As Jiandan said in YARN-6562, NM may delete Cgroup container file timeout > with logs like below: > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: > Unable to delete cgroup at: /cgroup/cpu/hadoop-yarn/container_xxx, tried to > delete for 1000ms > > we found one situation is that when we set > *yarn.nodemanager.sleep-delay-before-sigkill.ms* bigger than > *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms*, the > cgroup file leak happens *.* > > One container process tree looks like follow graph: > bash(16097)───java(16099)─┬─\{java}(16100) > ├─\{java}(16101) > {{ ├─\{java}(16102)}} > > {{when NM kills a container, NM sends kill -15 -pid to kill container process > group. Bash process will exit when it received sigterm, but java process may > do some job (shutdownHook etc.), and doesn't exit unit receive sigkill. And > when bash process exits, CgroupsLCEResourcesHandler begin to try to delete > cgroup files. So when > *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms* > arrived, the java processes may still running and cgourp/tasks still not > empty and cause a cgroup file leak.}} > > {{we add a condition that > *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms* must > bigger than *yarn.nodemanager.sleep-delay-before-sigkill.ms* to solve this > problem.}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7190) Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath
[ https://issues.apache.org/jira/browse/YARN-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-7190: --- Affects Version/s: (was: 3.0.x) > Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user > classpath > > > Key: YARN-7190 > URL: https://issues.apache.org/jira/browse/YARN-7190 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineclient, timelinereader, timelineserver >Affects Versions: 2.9.0, 3.0.1, 3.0.2, 3.0.3 >Reporter: Vrushali C >Assignee: Varun Saxena >Priority: Major > Fix For: YARN-5355_branch2, 3.1.0, 2.9.1, 3.0.3 > > Attachments: YARN-7190-YARN-5355_branch2.01.patch, > YARN-7190-YARN-5355_branch2.02.patch, YARN-7190-YARN-5355_branch2.03.patch, > YARN-7190.01.patch, YARN-7190.02.patch > > > [~jlowe] had a good observation about the user classpath getting extra jars > in hadoop 2.x brought in with TSv2. If users start picking up Hadoop 2,x's > version of HBase jars instead of the ones they shipped with their job, it > could be a problem. > So when TSv2 is to be used in 2,x, the hbase related jars should come into > only the NM classpath not the user classpath. > Here is a list of some jars > {code} > commons-csv-1.0.jar > commons-el-1.0.jar > commons-httpclient-3.1.jar > disruptor-3.3.0.jar > findbugs-annotations-1.3.9-1.jar > hbase-annotations-1.2.6.jar > hbase-client-1.2.6.jar > hbase-common-1.2.6.jar > hbase-hadoop2-compat-1.2.6.jar > hbase-hadoop-compat-1.2.6.jar > hbase-prefix-tree-1.2.6.jar > hbase-procedure-1.2.6.jar > hbase-protocol-1.2.6.jar > hbase-server-1.2.6.jar > htrace-core-3.1.0-incubating.jar > jamon-runtime-2.4.1.jar > jasper-compiler-5.5.23.jar > jasper-runtime-5.5.23.jar > jcodings-1.0.8.jar > joni-2.1.2.jar > jsp-2.1-6.1.14.jar > jsp-api-2.1-6.1.14.jar > jsr311-api-1.1.1.jar > metrics-core-2.2.0.jar > servlet-api-2.5-6.1.14.jar > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8354) SingleConstraintAppPlacementAllocator's allocate does not decPendingResource
[ https://issues.apache.org/jira/browse/YARN-8354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-8354: --- Affects Version/s: (was: 3.0.x) > SingleConstraintAppPlacementAllocator's allocate does not decPendingResource > > > Key: YARN-8354 > URL: https://issues.apache.org/jira/browse/YARN-8354 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Reporter: LongGang Chen >Priority: Major > > SingleConstraintAppPlacementAllocator.allocate() does not > decPendingResource,only > reduce ResourceSizing.numAllocations by one. > may be we should change decreasePendingNumAllocation() from : > > {code:java} > private void decreasePendingNumAllocation() { > // Deduct pending #allocations by 1 > ResourceSizing sizing = schedulingRequest.getResourceSizing(); > sizing.setNumAllocations(sizing.getNumAllocations() - 1); > } > {code} > to: > {code:java} > private void decreasePendingNumAllocation() { > // Deduct pending #allocations by 1 > ResourceSizing sizing = schedulingRequest.getResourceSizing(); > sizing.setNumAllocations(sizing.getNumAllocations() - 1); > // Deduct pending resource of app and queue > appSchedulingInfo.decPendingResource( > schedulingRequest.getNodeLabelExpression(), > sizing.getResources()); > } > } > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8353) LightWeightResource's hashCode function is different from parent class
[ https://issues.apache.org/jira/browse/YARN-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-8353: --- Affects Version/s: (was: 3.0.x) > LightWeightResource's hashCode function is different from parent class > -- > > Key: YARN-8353 > URL: https://issues.apache.org/jira/browse/YARN-8353 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Reporter: LongGang Chen >Priority: Major > > LightWeightResource's hashCode function is different from parent class. > One of the consequences is: > ContainerUpdateContext.removeFromOutstandingUpdate will nor work correct, > ContainerUpdateContext.outstandingIncreases will has smelly datas. > a simple test: > {code:java} > public void testHashCode() throws Exception{ > Resource resource = Resources.createResource(10,10); > Resource resource1 = new ResourcePBImpl(); > resource1.setMemorySize(10L); > resource1.setVirtualCores(10); > int x = resource.hashCode(); > int y = resource1.hashCode(); > Assert.assertEquals(x, y); > } > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8355) container update error because of competition
[ https://issues.apache.org/jira/browse/YARN-8355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-8355: --- Affects Version/s: (was: 3.0.x) > container update error because of competition > - > > Key: YARN-8355 > URL: https://issues.apache.org/jira/browse/YARN-8355 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Reporter: LongGang Chen >Priority: Major > > first, Quickly go through the update logic, Increase as an example: > * 1: normal work in ApplicationMasterService, DefaultAMSProcessor. > * 2: CapacityScheduler.allocate will call > AbstractYarnScheduler.handleContainerUpdates > * 3: AbstractYarnScheduler.handleContainerUpdates will call > handleIncreaseRequests, then call > ContainerUpdateContext.checkAndAddToOutstandingIncreases > * 4: cancle && and new: checkAndAddToOutstandingIncreases will check this > inc update for this container, if there is an outstanding inc, it will cancle > it by calling appSchedulingInfo.allocate(...) to allocate a dummy container; > if the update is a fresh one, it will call > appSchedulingInfo.updateResourceRequests to add a new request. the capacity > of this new request is gap value between existing container and capacity of > updateRequest, for example, if original capacity is , the target > capacity of UpdateRequest is , the gap[the capacity of the new > request which will be added to appSchedulingInfo] is . > * 5: swap temp container and existing container: CapacityScheduler.allocate > call FiCaSchedulerApp.getAllocation(...), getAllocation will call > SchedulerApplicationAttempt.pullNewlyIncreasedContainers, then call > ContainerUpdateContext.swapContainer,swapContainer will swap the newly > allocated inc temp container with existing container, for example: original > capacity , temp inc container's capacity , so the > updated existing container has capacity ,inc update done. > the problem is: > if we send inc update twice for a certain container, for example: send inc > to , then send inc with new target , the > final updated capacity is uncertain. > Scenes one: > * 1: send inc update from to > * 2: scheduler aproves it, and commit it, so app.liveContainers has this > temp inc container with capacity in it. > * 3: send inc with new target , a new resourceRequest with > capacity will add to appSchedulingInfo, and swap first temp > container, after that, the existing container has new > capacity > * 4: scheduler aproves the send temp resourceRequest, allocate second temp > container with capacity > * 5: swap the second inc temp container. so the updated capacity of this > existing container is = , but wanted is > Scenes two: > * 1: send send inc update from to > * 2: scheduler aproves it, but the temp container with capacity is > queued in commitService, wait to commit > * 3: send inc with new target , will add a new resourceRequest to > appSchedulingInfo, but with same SchedulerRequestKey. > * 4: the first temp container commit, app.apply will call > appSchedulingInfo.allocate to reduce pending num, at this situation, it will > cancle the second inc request. > * 5: swap the first int temp container. the updated existing container's > capacity is , but the wanted is > two key points: > * 1: when ContainerUpdateContext.checkAndAddToOutstandingIncreases cancle > previous inc request and put current inc request, it use same > SchedulerRequestKey , this action has competition with app.apply, like scenes > two, app.apply will cancle second inc update's request. > * 2: ContainerUpdateContext.swapContainer do not check the update target > change or not. > how to fix: > * 1: after ContainerUpdateContext.checkAndAddToOutstandingIncreases cancle > previous inc update request , use a new SchedulerRequestKey for current inc > update request . we can add a new field createTime to distinguish them, > default value of createTime is 0 > * 2: change ContainerUpdateContext.swapContainer to checkAndSwapContainer, > check update target change or not, if change, just ignore this temp container > and release it. like Scenes one, when we swap first temp inc container, we > found that if we do this swap, the updated capacity is , but the > newly target's capacity is , so we just ignore this swap, and > release the temp container. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8568) Replace the deprecated zk-address property in the HA config example in ResourceManagerHA.md
[ https://issues.apache.org/jira/browse/YARN-8568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-8568: --- Affects Version/s: (was: 3.0.x) > Replace the deprecated zk-address property in the HA config example in > ResourceManagerHA.md > --- > > Key: YARN-8568 > URL: https://issues.apache.org/jira/browse/YARN-8568 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Minor > Fix For: 3.2.0, 3.0.4, 3.1.2 > > Attachments: YARN-8568.001.patch > > > yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address > In the example, "yarn.resourcemanager.zk-address" is used which is > deprecated. In the description, the property name is correct > "hadoop.zk.address". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4677) RMNodeResourceUpdateEvent update from scheduler can lead to race condition
[ https://issues.apache.org/jira/browse/YARN-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-4677: --- Fix Version/s: (was: 3.0.x) > RMNodeResourceUpdateEvent update from scheduler can lead to race condition > -- > > Key: YARN-4677 > URL: https://issues.apache.org/jira/browse/YARN-4677 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, resourcemanager, scheduler >Affects Versions: 2.7.1 >Reporter: Brook Zhou >Assignee: Wilfred Spiegelenburg >Priority: Major > Fix For: 3.2.0, 3.1.1, 2.9.2 > > Attachments: YARN-4677-branch-2.001.patch, > YARN-4677-branch-2.002.patch, YARN-4677-branch-2.003.patch, YARN-4677.01.patch > > > When a node is in decommissioning state, there is time window between > completedContainer() and RMNodeResourceUpdateEvent get handled in > scheduler.nodeUpdate (YARN-3223). > So if a scheduling effort happens within this window, the new container could > still get allocated on this node. Even worse case is if scheduling effort > happen after RMNodeResourceUpdateEvent sent out but before it is propagated > to SchedulerNode - then the total resource is lower than used resource and > available resource is a negative value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4677) RMNodeResourceUpdateEvent update from scheduler can lead to race condition
[ https://issues.apache.org/jira/browse/YARN-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-4677: --- Target Version/s: 3.1.1, 3.2.0, 2.9.2 (was: 3.2.0, 3.1.1, 2.9.2, 3.0.x) > RMNodeResourceUpdateEvent update from scheduler can lead to race condition > -- > > Key: YARN-4677 > URL: https://issues.apache.org/jira/browse/YARN-4677 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, resourcemanager, scheduler >Affects Versions: 2.7.1 >Reporter: Brook Zhou >Assignee: Wilfred Spiegelenburg >Priority: Major > Fix For: 3.2.0, 3.1.1, 2.9.2 > > Attachments: YARN-4677-branch-2.001.patch, > YARN-4677-branch-2.002.patch, YARN-4677-branch-2.003.patch, YARN-4677.01.patch > > > When a node is in decommissioning state, there is time window between > completedContainer() and RMNodeResourceUpdateEvent get handled in > scheduler.nodeUpdate (YARN-3223). > So if a scheduling effort happens within this window, the new container could > still get allocated on this node. Even worse case is if scheduling effort > happen after RMNodeResourceUpdateEvent sent out but before it is propagated > to SchedulerNode - then the total resource is lower than used resource and > available resource is a negative value. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8382) cgroup file leak in NM
[ https://issues.apache.org/jira/browse/YARN-8382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-8382: --- Fix Version/s: (was: 3.0.x) > cgroup file leak in NM > -- > > Key: YARN-8382 > URL: https://issues.apache.org/jira/browse/YARN-8382 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Environment: we write an container with a shutdownHook which has a > piece of code like "while(true) sleep(100)" . when > *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms <* > *yarn.nodemanager.sleep-delay-before-sigkill.ms , cgourp file leak happens; > when* *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms >* > ** *yarn.nodemanager.sleep-delay-before-sigkill.ms, cgroup file is deleted > successfully*** >Reporter: Hu Ziqian >Assignee: Hu Ziqian >Priority: Major > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8382-branch-2.8.3.001.patch, > YARN-8382-branch-2.8.3.002.patch, YARN-8382.001.patch, YARN-8382.002.patch > > > As Jiandan said in YARN-6562, NM may delete Cgroup container file timeout > with logs like below: > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: > Unable to delete cgroup at: /cgroup/cpu/hadoop-yarn/container_xxx, tried to > delete for 1000ms > > we found one situation is that when we set > *yarn.nodemanager.sleep-delay-before-sigkill.ms* bigger than > *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms*, the > cgroup file leak happens *.* > > One container process tree looks like follow graph: > bash(16097)───java(16099)─┬─\{java}(16100) > ├─\{java}(16101) > {{ ├─\{java}(16102)}} > > {{when NM kills a container, NM sends kill -15 -pid to kill container process > group. Bash process will exit when it received sigterm, but java process may > do some job (shutdownHook etc.), and doesn't exit unit receive sigkill. And > when bash process exits, CgroupsLCEResourcesHandler begin to try to delete > cgroup files. So when > *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms* > arrived, the java processes may still running and cgourp/tasks still not > empty and cause a cgroup file leak.}} > > {{we add a condition that > *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms* must > bigger than *yarn.nodemanager.sleep-delay-before-sigkill.ms* to solve this > problem.}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6972) Adding RM ClusterId in AppInfo
[ https://issues.apache.org/jira/browse/YARN-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanuj Nayak updated YARN-6972: -- Attachment: YARN-6972.016.patch > Adding RM ClusterId in AppInfo > -- > > Key: YARN-6972 > URL: https://issues.apache.org/jira/browse/YARN-6972 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Tanuj Nayak >Priority: Major > Attachments: YARN-6972.001.patch, YARN-6972.002.patch, > YARN-6972.003.patch, YARN-6972.004.patch, YARN-6972.005.patch, > YARN-6972.006.patch, YARN-6972.007.patch, YARN-6972.008.patch, > YARN-6972.009.patch, YARN-6972.010.patch, YARN-6972.011.patch, > YARN-6972.012.patch, YARN-6972.013.patch, YARN-6972.014.patch, > YARN-6972.015.patch, YARN-6972.016.patch, YARN-6972.016.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8739) Fix jenkins issues for Node Attributes branch
[ https://issues.apache.org/jira/browse/YARN-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599804#comment-16599804 ] genericqa commented on YARN-8739: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | || || || || {color:brown} YARN-3409 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 21s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 51s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 22m 47s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 40s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 7m 40s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 24m 55s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 10m 20s{color} | {color:green} YARN-3409 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 3s{color} | {color:green} YARN-3409 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 20m 17s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 37s{color} | {color:orange} root: The patch generated 2 new + 143 unchanged - 50 fixed = 145 total (was 193) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 6m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 37s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 11m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 19s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 0s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 37s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 38s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 39s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 17s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 24m 56s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 37s{color} | {color:green} hadoop-sls in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 56s{color} | {color:green} The patch does not gen
[jira] [Resolved] (YARN-2097) Documentation: health check return status
[ https://issues.apache.org/jira/browse/YARN-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-2097. Resolution: Won't Fix > Documentation: health check return status > - > > Key: YARN-2097 > URL: https://issues.apache.org/jira/browse/YARN-2097 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.4.0 >Reporter: Allen Wittenauer >Assignee: Rekha Joshi >Priority: Major > Labels: newbie > Attachments: YARN-2097.1.patch > > > We need to document that the output of the health check script is ignored on > non-0 exit status. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-2345) yarn rmadmin -report
[ https://issues.apache.org/jira/browse/YARN-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-2345. Resolution: Won't Fix > yarn rmadmin -report > > > Key: YARN-2345 > URL: https://issues.apache.org/jira/browse/YARN-2345 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, resourcemanager >Reporter: Allen Wittenauer >Assignee: Hao Gao >Priority: Major > Labels: newbie > Attachments: YARN-2345.1.patch > > > It would be good to have an equivalent of hdfs dfsadmin -report in YARN. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-2413) capacity scheduler will overallocate vcores
[ https://issues.apache.org/jira/browse/YARN-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-2413. Resolution: Won't Fix > capacity scheduler will overallocate vcores > --- > > Key: YARN-2413 > URL: https://issues.apache.org/jira/browse/YARN-2413 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, scheduler >Affects Versions: 2.2.0, 3.0.0-alpha1 >Reporter: Allen Wittenauer >Priority: Critical > > It doesn't appear that the capacity scheduler is properly allocating vcores > when making scheduling decisions, which may result in overallocation of CPU > resources. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-2429) LCE should blacklist based upon group
[ https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-2429. Resolution: Won't Fix > LCE should blacklist based upon group > - > > Key: YARN-2429 > URL: https://issues.apache.org/jira/browse/YARN-2429 > Project: Hadoop YARN > Issue Type: New Feature > Components: security >Reporter: Allen Wittenauer >Priority: Major > Labels: newbie > > It should be possible to list a group to ban, not just individual users. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-2471) DEFAULT_YARN_APPLICATION_CLASSPATH doesn't honor hadoop-layout.sh
[ https://issues.apache.org/jira/browse/YARN-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-2471. Resolution: Won't Fix > DEFAULT_YARN_APPLICATION_CLASSPATH doesn't honor hadoop-layout.sh > - > > Key: YARN-2471 > URL: https://issues.apache.org/jira/browse/YARN-2471 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Allen Wittenauer >Priority: Major > > In 0.21, hadoop-layout.sh was introduced to allow for vendors to reorganize > the Hadoop distribution in a way that pleases them. > DEFAULT_YARN_APPLICATION_CLASSPATH hard-codes the paths that hadoop-layout.sh > was meant to override. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-2806) log container allocation requests
[ https://issues.apache.org/jira/browse/YARN-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-2806. Resolution: Won't Fix > log container allocation requests > - > > Key: YARN-2806 > URL: https://issues.apache.org/jira/browse/YARN-2806 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Allen Wittenauer >Assignee: Eric Wohlstadter >Priority: Major > Attachments: YARN-2806.patch > > > I might have missed it, but I don't see where we log application container > requests outside of the DEBUG context. Without this being logged, we have no > idea on a per application the lag an application might be having in the > allocation system. > We should probably add this as an event to the RM audit log. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-3175) Consolidate the ResournceManager documentation into one
[ https://issues.apache.org/jira/browse/YARN-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-3175. Resolution: Won't Fix > Consolidate the ResournceManager documentation into one > --- > > Key: YARN-3175 > URL: https://issues.apache.org/jira/browse/YARN-3175 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Allen Wittenauer >Priority: Major > > We really don't need a different document for every individual RM feature. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-3484) Fix up yarn top shell code
[ https://issues.apache.org/jira/browse/YARN-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-3484. Resolution: Won't Fix Target Version/s: (was: ) > Fix up yarn top shell code > -- > > Key: YARN-3484 > URL: https://issues.apache.org/jira/browse/YARN-3484 > Project: Hadoop YARN > Issue Type: Bug > Components: scripts >Affects Versions: 3.0.0-alpha1 >Reporter: Allen Wittenauer >Assignee: Varun Vasudev >Priority: Major > Labels: newbie > Attachments: YARN-3484.001.patch, YARN-3484.002.patch > > > We need to do some work on yarn top's shell code. > a) Just checking for TERM isn't good enough. We really need to check the > return on tput, especially since the output will not be a number but an error > string which will likely blow up the java code in horrible ways. > b) All the single bracket tests should be double brackets to force the bash > built-in. > c) I'd think I'd rather see the shell portion in a function since it's rather > large. This will allow for args, etc, to get local'ized and clean up the > case statement. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-4432) yarn launch script works by chance
[ https://issues.apache.org/jira/browse/YARN-4432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-4432. Resolution: Won't Fix > yarn launch script works by chance > -- > > Key: YARN-4432 > URL: https://issues.apache.org/jira/browse/YARN-4432 > Project: Hadoop YARN > Issue Type: Bug > Components: scripts, yarn >Affects Versions: 3.0.0-alpha1 >Reporter: Allen Wittenauer >Priority: Blocker > > The YARN launch script has (at least) three big problems: > * Usage of env vars before being assigned > * Usage of env vars that are never assigned > * Assumption that HADOOP_ROOT_LOGGER allows overrides > These need to be fixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5064) move the shell code out of hadoop-yarn
[ https://issues.apache.org/jira/browse/YARN-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-5064. Resolution: Won't Fix > move the shell code out of hadoop-yarn > -- > > Key: YARN-5064 > URL: https://issues.apache.org/jira/browse/YARN-5064 > Project: Hadoop YARN > Issue Type: Test > Components: scripts, test >Affects Versions: 3.0.0-alpha1 >Reporter: Allen Wittenauer >Priority: Major > > We need to move the shell code out of hadoop-yarn so that we can properly > build test infrastructure for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5099) hadoop-yarn unit tests for dynamic commands
[ https://issues.apache.org/jira/browse/YARN-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-5099. Resolution: Won't Fix > hadoop-yarn unit tests for dynamic commands > --- > > Key: YARN-5099 > URL: https://issues.apache.org/jira/browse/YARN-5099 > Project: Hadoop YARN > Issue Type: Test > Components: scripts, test >Reporter: Allen Wittenauer >Priority: Major > > This is a hold-over from HADOOP-12930, dynamic sub commands. Currently, the > yarn changes lack unit tests and they really should be there. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5530) YARN dependencies are a complete mess
[ https://issues.apache.org/jira/browse/YARN-5530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-5530. Resolution: Won't Fix > YARN dependencies are a complete mess > - > > Key: YARN-5530 > URL: https://issues.apache.org/jira/browse/YARN-5530 > Project: Hadoop YARN > Issue Type: Bug > Components: build >Affects Versions: 3.0.0-alpha1 >Reporter: Allen Wittenauer >Priority: Critical > > YARN's share/hadoop/yarn/lib is pretty much a disaster area. Multiple jars > have multiple versions. Then there are the version collisions with the rest > of Hadoop. Oh, and then there are the test jars sitting in there. > This really needs to get cleaned up since all of this stuff is on the > classpath and are likely going to cause a lot of problems down the road, > never mind the download bloat. (trunk's yarn dependencies are 2x what they > were in branch-2, thereby eliminating all the gains made by de-duping jars > across the projects.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5454) Various places have a hard-coded location for bash
[ https://issues.apache.org/jira/browse/YARN-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-5454. Resolution: Won't Fix > Various places have a hard-coded location for bash > -- > > Key: YARN-5454 > URL: https://issues.apache.org/jira/browse/YARN-5454 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0-alpha1 >Reporter: Allen Wittenauer >Priority: Major > > Lots of places in nodemanager have the location of bash hard-coded to > /bin/bash. This is not portable. bash should either be found via > /usr/bin/env or have no path at all. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-5635) Better handling when bad script is configured as Node's HealthScript
[ https://issues.apache.org/jira/browse/YARN-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-5635. Resolution: Won't Fix > Better handling when bad script is configured as Node's HealthScript > > > Key: YARN-5635 > URL: https://issues.apache.org/jira/browse/YARN-5635 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Allen Wittenauer >Priority: Major > > Earlier fix to YARN-5567 is reverted because its not ideal to get the whole > cluster down because of a bad script. At the same time its important to > report that script is erroneous which is configured as node health script as > it might miss to detect bad health of a node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6241) Remove -jt flag
[ https://issues.apache.org/jira/browse/YARN-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-6241. Resolution: Won't Fix > Remove -jt flag > --- > > Key: YARN-6241 > URL: https://issues.apache.org/jira/browse/YARN-6241 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha2 >Reporter: Allen Wittenauer >Priority: Major > > The -jt flag is used to send a job to a remote resourcemanager. Given the > flag, this is clearly left over from pre-YARN days. With the addition of the > time line server and other YARN services, the flag doesn't really work that > well anymore. It's probably better to deprecate it in 2.x and remove from > 3.x than attempt to fix it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6452) test-container-executor should not be in bin in dist tarball
[ https://issues.apache.org/jira/browse/YARN-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-6452. Resolution: Won't Fix > test-container-executor should not be in bin in dist tarball > > > Key: YARN-6452 > URL: https://issues.apache.org/jira/browse/YARN-6452 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0-alpha4 >Reporter: Allen Wittenauer >Priority: Minor > > test-container-executor should probably be in sbin or libexec or not there at > all. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-7588) Remove 'yarn historyserver' from bin/yarn
[ https://issues.apache.org/jira/browse/YARN-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-7588. Resolution: Won't Fix > Remove 'yarn historyserver' from bin/yarn > - > > Key: YARN-7588 > URL: https://issues.apache.org/jira/browse/YARN-7588 > Project: Hadoop YARN > Issue Type: Bug > Components: scripts >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Priority: Minor > > 'yarn historyserver' command has been replaced with 'yarn timelineserver' > since 2.7.0. Let's remove the dead code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router
[ https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599711#comment-16599711 ] genericqa commented on YARN-8699: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 9s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 45s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 10s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 47s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 33s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 36s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 44s{color} | {color:green} hadoop-yarn-server-router in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 95m 3s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | YARN-8699 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12938008/YARN-8699.004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 3a2e3d7b6044 4.4
[jira] [Comment Edited] (YARN-8102) Retrospect on having enable and disable flag for Node Attribute
[ https://issues.apache.org/jira/browse/YARN-8102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599683#comment-16599683 ] Bibin A Chundatt edited comment on YARN-8102 at 9/1/18 5:32 PM: [~cheersyang]/[~Naganarasimha] Nodelabel change can be compatability issue .. but attributed we could change. Just for your info {quote} hadoop.tmp.dir -> default value is /tmp/hadoop-${user.name}. {quote} was (Author: bibinchundatt): [~cheersyang]/[~Naganarasimha] Nodelabel change can be compatability issue .. but attributed we could change. Just for your info {hadoop.tmp.dir} - default value is /tmp/hadoop-${user.name}. > Retrospect on having enable and disable flag for Node Attribute > --- > > Key: YARN-8102 > URL: https://issues.apache.org/jira/browse/YARN-8102 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > > Currently node attribute feature is by default enabled. We have to revisit on > the same. > Enabling by default means will try to create store for all cluster > installation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8102) Retrospect on having enable and disable flag for Node Attribute
[ https://issues.apache.org/jira/browse/YARN-8102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599683#comment-16599683 ] Bibin A Chundatt commented on YARN-8102: [~cheersyang]/[~Naganarasimha] Nodelabel change can be compatability issue .. but attributed we could change. Just for your info {hadoop.tmp.dir} - default value is /tmp/hadoop-${user.name}. > Retrospect on having enable and disable flag for Node Attribute > --- > > Key: YARN-8102 > URL: https://issues.apache.org/jira/browse/YARN-8102 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > > Currently node attribute feature is by default enabled. We have to revisit on > the same. > Enabling by default means will try to create store for all cluster > installation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router
[ https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599680#comment-16599680 ] Bibin A Chundatt commented on YARN-8699: [~giovanni.fumarola] Updated patch to fix check style issues again.. Could you review latest patch . > Add Yarnclient#yarnclusterMetrics API implementation in router > -- > > Key: YARN-8699 > URL: https://issues.apache.org/jira/browse/YARN-8699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8699.001.patch, YARN-8699.002.patch, > YARN-8699.003.patch, YARN-8699.004.patch > > > Implement YarnclusterMetrics API in FederationClientInterceptor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8699) Add Yarnclient#yarnclusterMetrics API implementation in router
[ https://issues.apache.org/jira/browse/YARN-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-8699: --- Attachment: YARN-8699.004.patch > Add Yarnclient#yarnclusterMetrics API implementation in router > -- > > Key: YARN-8699 > URL: https://issues.apache.org/jira/browse/YARN-8699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-8699.001.patch, YARN-8699.002.patch, > YARN-8699.003.patch, YARN-8699.004.patch > > > Implement YarnclusterMetrics API in FederationClientInterceptor -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8739) Fix jenkins issues for Node Attributes branch
[ https://issues.apache.org/jira/browse/YARN-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599677#comment-16599677 ] Bibin A Chundatt commented on YARN-8739: Thank you [~sunilg] for raising the issue Minor comments # Length greater than 80 {quote} 191 private HashMap> applicationContainerIdMap = new HashMap>(); {quote} # proto class -- Starting of line should be upper case Will wait for jenkins result . > Fix jenkins issues for Node Attributes branch > - > > Key: YARN-8739 > URL: https://issues.apache.org/jira/browse/YARN-8739 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-8739-YARN-3409.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8535) Fix DistributedShell unit tests
[ https://issues.apache.org/jira/browse/YARN-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599676#comment-16599676 ] Bibin A Chundatt commented on YARN-8535: +1 LGTM will commit it by tomorrow > Fix DistributedShell unit tests > --- > > Key: YARN-8535 > URL: https://issues.apache.org/jira/browse/YARN-8535 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-shell, timelineservice >Reporter: Eric Yang >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-8535.001.patch, YARN-8535.002.patch, > YARN-8535.003.patch > > > These tests have been failing for a while in trunk: > |[testDSShellWithoutDomainV2|https://builds.apache.org/job/PreCommit-YARN-Build/21243/testReport/org.apache.hadoop.yarn.applications.distributedshell/TestDistributedShell/testDSShellWithoutDomainV2]|1 > min 20 sec|Failed| > |[testDSShellWithoutDomainV2CustomizedFlow|https://builds.apache.org/job/PreCommit-YARN-Build/21243/testReport/org.apache.hadoop.yarn.applications.distributedshell/TestDistributedShell/testDSShellWithoutDomainV2CustomizedFlow]|1 > min 20 sec|Failed| > |[testDSShellWithoutDomainV2DefaultFlow|https://builds.apache.org/job/PreCommit-YARN-Build/21243/testReport/org.apache.hadoop.yarn.applications.distributedshell/TestDistributedShell/testDSShellWithoutDomainV2DefaultFlow]|1 > min 20 sec|Failed| > The root causes are the same: > {code:java} > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.verifyEntityTypeFileExists(TestDistributedShell.java:628) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:546) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:451) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:310) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2(TestDistributedShell.java:306) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8739) Fix jenkins issues for Node Attributes branch
[ https://issues.apache.org/jira/browse/YARN-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599672#comment-16599672 ] Sunil Govindan commented on YARN-8739: -- Fixing checkstyle issues reported in YARN-8718 cc [~bibinchundatt] [~Naganarasimha] [~cheersyang] Pls help to review. > Fix jenkins issues for Node Attributes branch > - > > Key: YARN-8739 > URL: https://issues.apache.org/jira/browse/YARN-8739 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-8739-YARN-3409.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8739) Fix jenkins issues for Node Attributes branch
[ https://issues.apache.org/jira/browse/YARN-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-8739: - Attachment: YARN-8739-YARN-3409.001.patch > Fix jenkins issues for Node Attributes branch > - > > Key: YARN-8739 > URL: https://issues.apache.org/jira/browse/YARN-8739 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sunil Govindan >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-8739-YARN-3409.001.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8739) Fix jenkins issues for Node Attributes branch
Sunil Govindan created YARN-8739: Summary: Fix jenkins issues for Node Attributes branch Key: YARN-8739 URL: https://issues.apache.org/jira/browse/YARN-8739 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sunil Govindan Assignee: Sunil Govindan -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable
[ https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599643#comment-16599643 ] Shane Kumpf commented on YARN-8638: --- Thanks for the patch [~ccondit-target]! I have been able to successfully test this feature using a pluggable runtime. I can understand your reasoning behind ignoring the remaining warnings. It would be good to open an issue (likely a HADOOP JIRA) to start a conversation about removing these checks if they don't make sense and/or fixing the current issues. Beyond the warnings, the patch lgtm. I'll commit this after the holiday. > Allow linux container runtimes to be pluggable > -- > > Key: YARN-8638 > URL: https://issues.apache.org/jira/browse/YARN-8638 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.2.0 >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Minor > Attachments: YARN-8638.001.patch, YARN-8638.002.patch, > YARN-8638.003.patch, YARN-8638.004.patch > > > YARN currently supports three different Linux container runtimes (default, > docker, and javasandbox). However, it would be relatively straightforward to > support arbitrary runtime implementations. This would enable easier > experimentation with new and emerging runtime technologies (runc, containerd, > etc.) without requiring a rebuild and redeployment of Hadoop. > This could be accomplished via a simple configuration change: > {code:xml} > > yarn.nodemanager.runtime.linux.allowed-runtimes > default,docker,experimental > > > > yarn.nodemanager.runtime.linux.experimental.class > com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime > {code} > > In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would > now allow arbitrary values. Additionally, > {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the > {{LinuxContainerRuntime}} implementation to instantiate. A no-argument > constructor should be sufficient, as {{LinuxContainerRuntime}} already > provides an {{initialize()}} method. > {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map > env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} > could be generalized to {{isRuntimeRequested(Map env)}} and > added to the {{LinuxContainerRuntime}} interface. This would allow > {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on > whether that runtime claimed ownership of the current container execution. > For backwards compatibility, the existing values (default,docker,javasandbox) > would continue to be supported as-is. Under the current logic, the evaluation > order is javasandbox, docker, default (with default being chosen if no other > candidates are available). Under the new evaluation logic, pluggable runtimes > would be evaluated after docker and before default, in the order in which > they are defined in the allowed-runtimes list. This will change no behavior > on current clusters (as there would be no pluggable runtimes defined), and > preserves behavior with respect to ordering of existing runtimes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org