[jira] [Commented] (YARN-9440) Improve diagnostics for scheduler and app activities
[ https://issues.apache.org/jira/browse/YARN-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814127#comment-16814127 ] Tao Yang commented on YARN-9440: Attached v1 patch for review. Key changes in this patch: * basic ** Add interface DiagnosticsCollector and its implements ResourceDiagnosticsCollector/PlacementConstraintDiagnosticsCollector to collect resource and PC diagnostics. ** Add overload methods computeAvailableContainers/fitsIn in interface ResourceCalculator and its implements DefaultResourceCalculator/DominantResourceCalculator, add overload method fitsIn in util class Resources, to support collect resource diagnostics. ** Add AppRequestAllocationInfo and related updates in ActivitiesLogger/ActivityNode/AllocationActivity/AppAllocation/NodeAllocation/ActivitiesInfo/ActivityNodeInfo/AppActivitiesInfo/AppAllocationInfo to adjust date structure of activities such as (1) add request level in scheduler/app activities, (2) show property fields in app/request/container level and so on. * for scheduling process ** Add static class DiagnosticsCollectorManager and related logic in ActivitiesManager to manage collectors and enable them only when necessary, to avoid increasing unnecessary overload for scheduler. ** Update ActivityDiagnosticConstant/LeafQueue/AppSchedulingInfo/RegularContainerAllocator/AppPlacementAllocator/LocalityAppPlacementAllocator/SingleConstraintAppPlacementAllocator/PlacementConstraintsUtil to support collecting resource/PC diagnostics and improve the diagnostics in scheduling process. * UT ** Add ActivitiesTestUtils to maintain common check functions for testing activities. ** Add UT in TestResourceCalculator to test collecting resource diagnostics for resource calculator. ** Add UT in TestRMWebServicesSchedulerActivities/TestRMWebServicesSchedulerActivitiesWithMultiNodesEnabled to verify changes for scheduler/app activities. ** Update UT in TestActivitiesManager to adapt changes. > Improve diagnostics for scheduler and app activities > > > Key: YARN-9440 > URL: https://issues.apache.org/jira/browse/YARN-9440 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9440.001.patch > > > [Design > doc|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.cyw6zeehzqmx] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9440) Improve diagnostics for scheduler and app activities
[ https://issues.apache.org/jira/browse/YARN-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9440: --- Attachment: YARN-9440.001.patch > Improve diagnostics for scheduler and app activities > > > Key: YARN-9440 > URL: https://issues.apache.org/jira/browse/YARN-9440 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9440.001.patch > > > [Design > doc|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.cyw6zeehzqmx] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9468) Fix inaccurate documentations in Placement Constraints
[ https://issues.apache.org/jira/browse/YARN-9468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-9468: -- Summary: Fix inaccurate documentations in Placement Constraints (was: Document Placement Constraints) > Fix inaccurate documentations in Placement Constraints > -- > > Key: YARN-9468 > URL: https://issues.apache.org/jira/browse/YARN-9468 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.2.0 >Reporter: hunshenshi >Priority: Major > > Document Placement Constraints > *First* > {code:java} > zk=3,NOTIN,NODE,zk:hbase=5,IN,RACK,zk:spark=7,CARDINALITY,NODE,hbase,1,3{code} > * place 5 containers with tag “hbase” with affinity to a rack on which > containers with tag “zk” are running (i.e., an “hbase” container > should{color:#ff} not{color} be placed at a rack where an “zk” container > is running, given that “zk” is the TargetTag of the second constraint); > The _*not*_ word in brackets should be delete. > > *Second* > {code:java} > PlacementSpec => "" | KeyVal;PlacementSpec > {code} > The semicolon should be replaced by colon > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
[ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814114#comment-16814114 ] Hadoop QA commented on YARN-6929: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 47s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 39s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 2s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 26s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 416 unchanged - 12 fixed = 416 total (was 428) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 35s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 54s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 57s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 0s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 47s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}106m 9s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-6929 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12965411/YARN-6929-009.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
[jira] [Created] (YARN-9468) Document Placement Constraints
hunshenshi created YARN-9468: Summary: Document Placement Constraints Key: YARN-9468 URL: https://issues.apache.org/jira/browse/YARN-9468 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.2.0 Reporter: hunshenshi Document Placement Constraints *First* {code:java} zk=3,NOTIN,NODE,zk:hbase=5,IN,RACK,zk:spark=7,CARDINALITY,NODE,hbase,1,3{code} * place 5 containers with tag “hbase” with affinity to a rack on which containers with tag “zk” are running (i.e., an “hbase” container should{color:#FF} not{color} be placed at a rack where an “zk” container is running, given that “zk” is the TargetTag of the second constraint); The _*not*_ word in brackets should be delete. *Second* {code:java} PlacementSpec => "" | KeyVal;PlacementSpec {code} The semicolon should be replaced by colon -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9468) Document Placement Constraints
[ https://issues.apache.org/jira/browse/YARN-9468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hunshenshi updated YARN-9468: - Description: Document Placement Constraints *First* {code:java} zk=3,NOTIN,NODE,zk:hbase=5,IN,RACK,zk:spark=7,CARDINALITY,NODE,hbase,1,3{code} * place 5 containers with tag “hbase” with affinity to a rack on which containers with tag “zk” are running (i.e., an “hbase” container should{color:#ff} not{color} be placed at a rack where an “zk” container is running, given that “zk” is the TargetTag of the second constraint); The _*not*_ word in brackets should be delete. *Second* {code:java} PlacementSpec => "" | KeyVal;PlacementSpec {code} The semicolon should be replaced by colon was: Document Placement Constraints *First* {code:java} zk=3,NOTIN,NODE,zk:hbase=5,IN,RACK,zk:spark=7,CARDINALITY,NODE,hbase,1,3{code} * place 5 containers with tag “hbase” with affinity to a rack on which containers with tag “zk” are running (i.e., an “hbase” container should{color:#FF} not{color} be placed at a rack where an “zk” container is running, given that “zk” is the TargetTag of the second constraint); The _*not*_ word in brackets should be delete. *Second* {code:java} PlacementSpec => "" | KeyVal;PlacementSpec {code} The semicolon should be replaced by colon > Document Placement Constraints > -- > > Key: YARN-9468 > URL: https://issues.apache.org/jira/browse/YARN-9468 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.2.0 >Reporter: hunshenshi >Priority: Major > > Document Placement Constraints > *First* > {code:java} > zk=3,NOTIN,NODE,zk:hbase=5,IN,RACK,zk:spark=7,CARDINALITY,NODE,hbase,1,3{code} > > * place 5 containers with tag “hbase” with affinity to a rack on which > containers with tag “zk” are running (i.e., an “hbase” container > should{color:#ff} not{color} be placed at a rack where an “zk” container > is running, given that “zk” is the TargetTag of the second constraint); > The _*not*_ word in brackets should be delete. > > *Second* > {code:java} > PlacementSpec => "" | KeyVal;PlacementSpec > {code} > The semicolon should be replaced by colon > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9468) Document Placement Constraints
[ https://issues.apache.org/jira/browse/YARN-9468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hunshenshi updated YARN-9468: - Description: Document Placement Constraints *First* {code:java} zk=3,NOTIN,NODE,zk:hbase=5,IN,RACK,zk:spark=7,CARDINALITY,NODE,hbase,1,3{code} * place 5 containers with tag “hbase” with affinity to a rack on which containers with tag “zk” are running (i.e., an “hbase” container should{color:#ff} not{color} be placed at a rack where an “zk” container is running, given that “zk” is the TargetTag of the second constraint); The _*not*_ word in brackets should be delete. *Second* {code:java} PlacementSpec => "" | KeyVal;PlacementSpec {code} The semicolon should be replaced by colon was: Document Placement Constraints *First* {code:java} zk=3,NOTIN,NODE,zk:hbase=5,IN,RACK,zk:spark=7,CARDINALITY,NODE,hbase,1,3{code} * place 5 containers with tag “hbase” with affinity to a rack on which containers with tag “zk” are running (i.e., an “hbase” container should{color:#ff} not{color} be placed at a rack where an “zk” container is running, given that “zk” is the TargetTag of the second constraint); The _*not*_ word in brackets should be delete. *Second* {code:java} PlacementSpec => "" | KeyVal;PlacementSpec {code} The semicolon should be replaced by colon > Document Placement Constraints > -- > > Key: YARN-9468 > URL: https://issues.apache.org/jira/browse/YARN-9468 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.2.0 >Reporter: hunshenshi >Priority: Major > > Document Placement Constraints > *First* > {code:java} > zk=3,NOTIN,NODE,zk:hbase=5,IN,RACK,zk:spark=7,CARDINALITY,NODE,hbase,1,3{code} > * place 5 containers with tag “hbase” with affinity to a rack on which > containers with tag “zk” are running (i.e., an “hbase” container > should{color:#ff} not{color} be placed at a rack where an “zk” container > is running, given that “zk” is the TargetTag of the second constraint); > The _*not*_ word in brackets should be delete. > > *Second* > {code:java} > PlacementSpec => "" | KeyVal;PlacementSpec > {code} > The semicolon should be replaced by colon > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9440) Improve diagnostics for scheduler and app activities
[ https://issues.apache.org/jira/browse/YARN-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9440: --- Attachment: (was: YARN-9440.001.patch) > Improve diagnostics for scheduler and app activities > > > Key: YARN-9440 > URL: https://issues.apache.org/jira/browse/YARN-9440 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > > [Design > doc|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.cyw6zeehzqmx] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
[ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-6929: Attachment: YARN-6929-009.patch > yarn.nodemanager.remote-app-log-dir structure is not scalable > - > > Key: YARN-6929 > URL: https://issues.apache.org/jira/browse/YARN-6929 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-6929-007.patch, YARN-6929-008.patch, > YARN-6929-009.patch, YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.2.patch, > YARN-6929.3.patch, YARN-6929.4.patch, YARN-6929.5.patch, YARN-6929.6.patch, > YARN-6929.patch > > > The current directory structure for yarn.nodemanager.remote-app-log-dir is > not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). > With retention yarn.log-aggregation.retain-seconds of 7days, there are more > chances LogAggregationService fails to create a new directory with > FSLimitException$MaxDirectoryItemsExceededException. > The current structure is > //logs/. This can be > improved with adding date as a subdirectory like > //logs// > {code} > WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: > Application failed to init aggregation > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.ha
[jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
[ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814033#comment-16814033 ] Prabhu Joseph commented on YARN-6929: - Build Failure is due to YARN-999 and fixed in the YARN-999.addendum.patch. Will resubmit the patch. > yarn.nodemanager.remote-app-log-dir structure is not scalable > - > > Key: YARN-6929 > URL: https://issues.apache.org/jira/browse/YARN-6929 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-6929-007.patch, YARN-6929-008.patch, > YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.2.patch, YARN-6929.3.patch, > YARN-6929.4.patch, YARN-6929.5.patch, YARN-6929.6.patch, YARN-6929.patch > > > The current directory structure for yarn.nodemanager.remote-app-log-dir is > not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). > With retention yarn.log-aggregation.retain-seconds of 7days, there are more > chances LogAggregationService fails to create a new directory with > FSLimitException$MaxDirectoryItemsExceededException. > The current structure is > //logs/. This can be > improved with adding date as a subdirectory like > //logs// > {code} > WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: > Application failed to init aggregation > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apach
[jira] [Commented] (YARN-9464) Support "Pending Resource" metrics in RM's RESTful API
[ https://issues.apache.org/jira/browse/YARN-9464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814031#comment-16814031 ] Prabhu Joseph commented on YARN-9464: - [~tangzhankun] Failed test cases are not related - have reported YARN-9467 and YARN-9325 to fix the intermittent test case failures. Findbugs warnings also not related. Have fixed checkstyle issues. > Support "Pending Resource" metrics in RM's RESTful API > -- > > Key: YARN-9464 > URL: https://issues.apache.org/jira/browse/YARN-9464 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9464-001.patch, YARN-9464-002.patch > > > Knowing only the "available", "used" resource is not enough for YARN > management tools like auto-scaler. It would be helpful to diagnose the > cluster resource utilization if it gets "Pending Resource" from RM RESTful > APIs. In a certain extent, it represents how starving the applications are. > Initially, we can add "pending resource" information in below two RM REST > APIs: > {code:java} > RMnode:port/ws/v1/cluster/metrics > RMnode:port/ws/v1/cluster/nodes > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9464) Support "Pending Resource" metrics in RM's RESTful API
[ https://issues.apache.org/jira/browse/YARN-9464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9464: Attachment: YARN-9464-002.patch > Support "Pending Resource" metrics in RM's RESTful API > -- > > Key: YARN-9464 > URL: https://issues.apache.org/jira/browse/YARN-9464 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9464-001.patch, YARN-9464-002.patch > > > Knowing only the "available", "used" resource is not enough for YARN > management tools like auto-scaler. It would be helpful to diagnose the > cluster resource utilization if it gets "Pending Resource" from RM RESTful > APIs. In a certain extent, it represents how starving the applications are. > Initially, we can add "pending resource" information in below two RM REST > APIs: > {code:java} > RMnode:port/ws/v1/cluster/metrics > RMnode:port/ws/v1/cluster/nodes > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9467) TestCapacitySchedulerNodeLabelUpdate.testResourceUsageWhenNodeUpdatesPartition fails intermittent
Prabhu Joseph created YARN-9467: --- Summary: TestCapacitySchedulerNodeLabelUpdate.testResourceUsageWhenNodeUpdatesPartition fails intermittent Key: YARN-9467 URL: https://issues.apache.org/jira/browse/YARN-9467 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler, test Affects Versions: 3.2.0 Reporter: Prabhu Joseph Assignee: Prabhu Joseph TestCapacitySchedulerNodeLabelUpdate.testResourceUsageWhenNodeUpdatesPartition fails intermittent - observed in YARN-9464 {code:java} java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate.checkUserUsedResource(TestCapacitySchedulerNodeLabelUpdate.java:191) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate.testResourceUsageWhenNodeUpdatesPartition(TestCapacitySchedulerNodeLabelUpdate.java:410) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9440) Improve diagnostics for scheduler and app activities
[ https://issues.apache.org/jira/browse/YARN-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9440: --- Attachment: YARN-9440.001.patch > Improve diagnostics for scheduler and app activities > > > Key: YARN-9440 > URL: https://issues.apache.org/jira/browse/YARN-9440 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9440.001.patch > > > [Design > doc|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.cyw6zeehzqmx] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9379) Can't specify docker runtime through environment
[ https://issues.apache.org/jira/browse/YARN-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16814000#comment-16814000 ] caozhiqiang commented on YARN-9379: --- OK, your suggest is very good, and I will follow it. Thank you! > Can't specify docker runtime through environment > > > Key: YARN-9379 > URL: https://issues.apache.org/jira/browse/YARN-9379 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.3.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Minor > Attachments: YARN-9379-branch-3.2.0.001.patch, YARN-9379.002.patch, > YARN-9379.003.patch, YARN-9379.004.patch > > > When use docker to run yarn containers, even though there is > docker.allowed.runtimes in container-executor.cfg, there are not parameter to > specify the docker runtime, such as gvisor, lxc or kata. With this patch, > client can add parameter such as > -[Dyarn.app.mapreduce.am|http://dyarn.app.mapreduce.am/].env.YARN_CONTAINER_RUNTIME_DOCKER_RUNTIME=runsc > to specify docker runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9439) Support asynchronized scheduling mode and multi-node lookup mechanism for app activities
[ https://issues.apache.org/jira/browse/YARN-9439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813997#comment-16813997 ] Tao Yang commented on YARN-9439: Hi, [~cheersyang]. Findbugs warnings are not related to this patch. They seems not expect null input for SettableFuture#set even through it's declared as Nullable, perhaps findbugs can support javax.annotation.Nullable but not org.checkerframework.checker.nullness.qual.Nullable. I think we can exclude these warnings in findbugs-exclude.xml, can you share your thoughts about this? > Support asynchronized scheduling mode and multi-node lookup mechanism for app > activities > > > Key: YARN-9439 > URL: https://issues.apache.org/jira/browse/YARN-9439 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9439.001.patch, YARN-9439.002.patch, > YARN-9439.003.patch > > > [Design > doc|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.m051gyiikx7c] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9466) App catalog navigation stylesheet does not display correctly in Safari
[ https://issues.apache.org/jira/browse/YARN-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813920#comment-16813920 ] Hadoop QA commented on YARN-9466: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 28m 59s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 28s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 56s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 52m 9s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9466 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12965375/YARN-9466.001.patch | | Optional Tests | dupname asflicense shadedclient | | uname | Linux 85182118401c 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 25c421b | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 443 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-webapp U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-webapp | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23925/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > App catalog navigation stylesheet does not display correctly in Safari > -- > > Key: YARN-9466 > URL: https://issues.apache.org/jira/browse/YARN-9466 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9466.001.patch, catalog-chrome.png, > catalog-safari.png > > > When navigation side bar has less content than right side table, the > navigation bar will shrink into smaller size in Safari. See the attached > screenshot for the problem and desired looked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813900#comment-16813900 ] Hudson commented on YARN-999: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16369 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16369/]) YARN-999. In case of long running tasks, reduce node resource should (gifuma: rev 358e9286223029ba28f5afe0f7433d95a735b78f) * (edit) hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java * (edit) hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-291.000.patch, YARN-999.001.patch, > YARN-999.002.patch, YARN-999.003.patch, YARN-999.004.patch, > YARN-999.005.patch, YARN-999.006.patch, YARN-999.007.patch, > YARN-999.008.patch, YARN-999.009.patch, YARN-999.010.patch, > YARN-999.addendum.patch > > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be used here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9463) Add queueName info when failing with queue capacity sanity check
[ https://issues.apache.org/jira/browse/YARN-9463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813843#comment-16813843 ] Hadoop QA commented on YARN-9463: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 17m 42s{color} | {color:red} root in trunk failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 28s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 16s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager in trunk has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 32s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 52s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}126m 28s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestResourceTrackerService | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9463 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12965360/YARN-9463.1.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 0f9c7f7d44ac 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cfec455 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | mvninstall | https://builds.apache.org/job/PreCommit-YARN-Build/23923/artifact/out/branch-mvninstall-root.txt | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/23923/artifact/out/branch-findbugs-hadoop-yarn-project
[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813826#comment-16813826 ] Giovanni Matteo Fumarola commented on YARN-999: --- Committed the addendum to trunk. Thanks [~ste...@apache.org] for raising the issue and [~elgoiri] for fixing it. > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-291.000.patch, YARN-999.001.patch, > YARN-999.002.patch, YARN-999.003.patch, YARN-999.004.patch, > YARN-999.005.patch, YARN-999.006.patch, YARN-999.007.patch, > YARN-999.008.patch, YARN-999.009.patch, YARN-999.010.patch, > YARN-999.addendum.patch > > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be used here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
[ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813822#comment-16813822 ] Hadoop QA commented on YARN-6929: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 16m 24s{color} | {color:red} root in trunk failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 39s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 10s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 23s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 416 unchanged - 12 fixed = 416 total (was 428) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 9s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 52s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 44s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 58s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}109m 29s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-6929 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12965359/YARN-6929-008.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle
[jira] [Updated] (YARN-9466) App catalog navigation stylesheet does not display correctly in Safari
[ https://issues.apache.org/jira/browse/YARN-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9466: Attachment: YARN-9466.001.patch > App catalog navigation stylesheet does not display correctly in Safari > -- > > Key: YARN-9466 > URL: https://issues.apache.org/jira/browse/YARN-9466 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9466.001.patch, catalog-chrome.png, > catalog-safari.png > > > When navigation side bar has less content than right side table, the > navigation bar will shrink into smaller size in Safari. See the attached > screenshot for the problem and desired looked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813811#comment-16813811 ] Íñigo Goiri commented on YARN-999: -- Compiled locally. Let's go with [^YARN-999.addendum.patch]. > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-291.000.patch, YARN-999.001.patch, > YARN-999.002.patch, YARN-999.003.patch, YARN-999.004.patch, > YARN-999.005.patch, YARN-999.006.patch, YARN-999.007.patch, > YARN-999.008.patch, YARN-999.009.patch, YARN-999.010.patch, > YARN-999.addendum.patch > > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be used here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9466) App catalog navigation stylesheet does not display correctly in Safari
[ https://issues.apache.org/jira/browse/YARN-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9466: Attachment: catalog-safari.png > App catalog navigation stylesheet does not display correctly in Safari > -- > > Key: YARN-9466 > URL: https://issues.apache.org/jira/browse/YARN-9466 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: catalog-chrome.png, catalog-safari.png > > > When navigation side bar has less content than right side table, the > navigation bar will shrink into smaller size in Safari. See the attached > screenshot for the problem and desired looked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9466) App catalog navigation stylesheet does not display correctly in Safari
[ https://issues.apache.org/jira/browse/YARN-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9466: Attachment: catalog-chrome.png > App catalog navigation stylesheet does not display correctly in Safari > -- > > Key: YARN-9466 > URL: https://issues.apache.org/jira/browse/YARN-9466 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: catalog-chrome.png, catalog-safari.png > > > When navigation side bar has less content than right side table, the > navigation bar will shrink into smaller size in Safari. See the attached > screenshot for the problem and desired looked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9379) Can't specify docker runtime through environment
[ https://issues.apache.org/jira/browse/YARN-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813797#comment-16813797 ] Eric Badger commented on YARN-9379: --- Thanks for the update, [~caozhiqiang]. The findbugs is indeed unrelated. Don't worry about the version either. As for the patch, it looks pretty good. However, I have a few comments. All of the other environment variables that are used in {{DockerLinuxContainerRuntime.launchContainer()}} have a validate step against a yarn-site.xml config. This is so that we can fail fast without having to invoke the container executor if we know that we aren't going to want to launch with these environment variable configurations. The container-executor.cfg ({{docker.allowed.runtimes}}) is the ultimate source of truth, but the yarn-site.xml configs act as a fail fast first line of defense. So it would be good to add that config and accompanying validation for the allowed runtimes. On the unit test, it looks like you're checking the size of the created docker command, but not checking to see if the runtime was actually set correctly. Could you add some code to check that the runtime was correctly set to what you want it to be? You will also want to add a test case to make sure that the allowed list works. > Can't specify docker runtime through environment > > > Key: YARN-9379 > URL: https://issues.apache.org/jira/browse/YARN-9379 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.3.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Minor > Attachments: YARN-9379-branch-3.2.0.001.patch, YARN-9379.002.patch, > YARN-9379.003.patch, YARN-9379.004.patch > > > When use docker to run yarn containers, even though there is > docker.allowed.runtimes in container-executor.cfg, there are not parameter to > specify the docker runtime, such as gvisor, lxc or kata. With this patch, > client can add parameter such as > -[Dyarn.app.mapreduce.am|http://dyarn.app.mapreduce.am/].env.YARN_CONTAINER_RUNTIME_DOCKER_RUNTIME=runsc > to specify docker runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813795#comment-16813795 ] Giovanni Matteo Fumarola commented on YARN-999: --- The addendum looks ok. Should we commit to unblock the branch or just wait yetus result? > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-291.000.patch, YARN-999.001.patch, > YARN-999.002.patch, YARN-999.003.patch, YARN-999.004.patch, > YARN-999.005.patch, YARN-999.006.patch, YARN-999.007.patch, > YARN-999.008.patch, YARN-999.009.patch, YARN-999.010.patch, > YARN-999.addendum.patch > > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be used here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated YARN-999: - Attachment: YARN-999.addendum.patch > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-291.000.patch, YARN-999.001.patch, > YARN-999.002.patch, YARN-999.003.patch, YARN-999.004.patch, > YARN-999.005.patch, YARN-999.006.patch, YARN-999.007.patch, > YARN-999.008.patch, YARN-999.009.patch, YARN-999.010.patch, > YARN-999.addendum.patch > > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be used here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9466) App catalog navigation stylesheet does not display correctly in Safari
Eric Yang created YARN-9466: --- Summary: App catalog navigation stylesheet does not display correctly in Safari Key: YARN-9466 URL: https://issues.apache.org/jira/browse/YARN-9466 Project: Hadoop YARN Issue Type: Sub-task Reporter: Eric Yang Assignee: Eric Yang When navigation side bar has less content than right side table, the navigation bar will shrink into smaller size in Safari. See the attached screenshot for the problem and desired looked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813752#comment-16813752 ] Íñigo Goiri commented on YARN-999: -- Do you prefer addendum or a new JIRA? > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-291.000.patch, YARN-999.001.patch, > YARN-999.002.patch, YARN-999.003.patch, YARN-999.004.patch, > YARN-999.005.patch, YARN-999.006.patch, YARN-999.007.patch, > YARN-999.008.patch, YARN-999.009.patch, YARN-999.010.patch > > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be used here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813745#comment-16813745 ] Steve Loughran commented on YARN-999: - no worries, gone back one commit locally > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-291.000.patch, YARN-999.001.patch, > YARN-999.002.patch, YARN-999.003.patch, YARN-999.004.patch, > YARN-999.005.patch, YARN-999.006.patch, YARN-999.007.patch, > YARN-999.008.patch, YARN-999.009.patch, YARN-999.010.patch > > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be used here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran reopened YARN-999: - > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-291.000.patch, YARN-999.001.patch, > YARN-999.002.patch, YARN-999.003.patch, YARN-999.004.patch, > YARN-999.005.patch, YARN-999.006.patch, YARN-999.007.patch, > YARN-999.008.patch, YARN-999.009.patch, YARN-999.010.patch > > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be used here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813740#comment-16813740 ] Giovanni Matteo Fumarola commented on YARN-999: --- Thanks [~ste...@apache.org], we are fixing it right now. > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-291.000.patch, YARN-999.001.patch, > YARN-999.002.patch, YARN-999.003.patch, YARN-999.004.patch, > YARN-999.005.patch, YARN-999.006.patch, YARN-999.007.patch, > YARN-999.008.patch, YARN-999.009.patch, YARN-999.010.patch > > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be used here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9464) Support "Pending Resource" metrics in RM's RESTful API
[ https://issues.apache.org/jira/browse/YARN-9464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813737#comment-16813737 ] Hadoop QA commented on YARN-9464: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 16s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 17s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager in trunk has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 29s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 10 unchanged - 2 fixed = 11 total (was 12) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 45s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 27s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}141m 11s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueManagementDynamicEditPolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9464 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12965342/YARN-9464-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux b11c162342ff 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8ef3bc8 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/23921/artifact/out
[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813733#comment-16813733 ] Steve Loughran commented on YARN-999: - I think this has broken the build {code} [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hadoop-sls: Compilation failure: Compilation failure: [ERROR] /Users/stevel/Projects/hadoop-trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java:[60,18] org.apache.hadoop.yarn.sls.nodemanager.NodeInfo.FakeRMNodeImpl is not abstract and does not override abstract method resetUpdatedCapability() in org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNode [ERROR] /Users/stevel/Projects/hadoop-trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java:[47,8] org.apache.hadoop.yarn.sls.scheduler.RMNodeWrapper is not abstract and does not override abstract method resetUpdatedCapability() in org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeI {code} Can you fix this ASAP, so we don't have to roll things back.Or change the modified interface to have some default functions. > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-291.000.patch, YARN-999.001.patch, > YARN-999.002.patch, YARN-999.003.patch, YARN-999.004.patch, > YARN-999.005.patch, YARN-999.006.patch, YARN-999.007.patch, > YARN-999.008.patch, YARN-999.009.patch, YARN-999.010.patch > > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be used here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813683#comment-16813683 ] Hudson commented on YARN-999: - FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16367 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16367/]) YARN-999. In case of long running tasks, reduce node resource should (gifuma: rev cfec455c452d85229ef2f9d83e6f7fc827946b59) * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerOvercommit.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNode.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestAbstractYarnScheduler.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerOvercommit.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/hadoop-metrics2-resourcemanager.properties * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceOption.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerOvercommit.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/hadoop-metrics2.properties * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-291.000.patch, YARN-999.001.patch, > YARN-999.002.patch, YARN-999.003.patch, YARN-999.004.patch, > YARN-999.005.patch, YARN-999.006.patch, YARN-999.007.patch, > YARN-999.008.patch, YARN-999.009.patch, YARN-999.010.patch > > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be used here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9435) Add Opportunistic Scheduler metrics in ResourceManager.
[ https://issues.apache.org/jira/browse/YARN-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813679#comment-16813679 ] Giovanni Matteo Fumarola commented on YARN-9435: Thanks [~abmodi] for the patch. Why do we need a Thread.sleep(1000); in the unit test? > Add Opportunistic Scheduler metrics in ResourceManager. > --- > > Key: YARN-9435 > URL: https://issues.apache.org/jira/browse/YARN-9435 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9435.001.patch, YARN-9435.002.patch, > YARN-9435.003.patch > > > Right now there are no metrics available for Opportunistic Scheduler at > ResourceManager. As part of this jira, we will add metrics like number of > allocated opportunistic containers, released opportunistic containers, node > level allocations, rack level allocations etc. for Opportunistic Scheduler. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813676#comment-16813676 ] Íñigo Goiri commented on YARN-999: -- Thank you very much [~giovanni.fumarola]! > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-291.000.patch, YARN-999.001.patch, > YARN-999.002.patch, YARN-999.003.patch, YARN-999.004.patch, > YARN-999.005.patch, YARN-999.006.patch, YARN-999.007.patch, > YARN-999.008.patch, YARN-999.009.patch, YARN-999.010.patch > > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be used here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813674#comment-16813674 ] Giovanni Matteo Fumarola commented on YARN-999: --- Committed to trunk. > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-291.000.patch, YARN-999.001.patch, > YARN-999.002.patch, YARN-999.003.patch, YARN-999.004.patch, > YARN-999.005.patch, YARN-999.006.patch, YARN-999.007.patch, > YARN-999.008.patch, YARN-999.009.patch, YARN-999.010.patch > > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be used here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-999: -- Fix Version/s: 3.3.0 > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-291.000.patch, YARN-999.001.patch, > YARN-999.002.patch, YARN-999.003.patch, YARN-999.004.patch, > YARN-999.005.patch, YARN-999.006.patch, YARN-999.007.patch, > YARN-999.008.patch, YARN-999.009.patch, YARN-999.010.patch > > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be used here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
[ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813661#comment-16813661 ] Prabhu Joseph commented on YARN-6929: - [~pbacsko] Thanks for reviewing. Have attached 008 patch addressing above comments. Please review the same when you get time. Below are the details 1. During upgrade the job will have some log files in old app log dir and some in new app log dir. The reader logic has to return node log files from both places. The logic returns an Iterator of node files only from new app log dir if {{yarn.nodemanager.remote-app-log-dir-include-older}} is false. Else, a combined iterator which traverses both old and new log files. {{nodeFilesPrev}} and {{nodeFilesCur}} are iterators of old and new app log dir respectively. Have added comments and few changes in code to make it more readable. 2. Have used {{diagnosticsMsg}} in all places. 3. {{nodeFilesCur}} can be null only if there is an {{IOException}} (new app log dir does not exist or error when reading). In this case, throw the captured {{diagnosticsMsg}} else the {{nodeFilesCur}}. 4. {{diagnosticsMsg}} is appended max twice and also with {{IOException#getMessage()}} which is a limited one without stacktrace. 5. Have addressed this one. > yarn.nodemanager.remote-app-log-dir structure is not scalable > - > > Key: YARN-6929 > URL: https://issues.apache.org/jira/browse/YARN-6929 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-6929-007.patch, YARN-6929-008.patch, > YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.2.patch, YARN-6929.3.patch, > YARN-6929.4.patch, YARN-6929.5.patch, YARN-6929.6.patch, YARN-6929.patch > > > The current directory structure for yarn.nodemanager.remote-app-log-dir is > not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). > With retention yarn.log-aggregation.retain-seconds of 7days, there are more > chances LogAggregationService fails to create a new directory with > FSLimitException$MaxDirectoryItemsExceededException. > The current structure is > //logs/. This can be > improved with adding date as a subdirectory like > //logs// > {code} > WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: > Application failed to init aggregation > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366) > > at > org
[jira] [Commented] (YARN-9463) Add queueName info when failing with queue capacity sanity check
[ https://issues.apache.org/jira/browse/YARN-9463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813645#comment-16813645 ] Aihua Xu commented on YARN-9463: Simple fix: the error will print out queue info as well now. > Add queueName info when failing with queue capacity sanity check > > > Key: YARN-9463 > URL: https://issues.apache.org/jira/browse/YARN-9463 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.9.1 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Trivial > Attachments: YARN-9463.1.patch > > > In queue sanity check of CSQueueUtils.java , we are throwing "Illegal queue > capacity setting, (abs-capacity=0.00160782) > > (abs-maximum-capacity=0.0016027201). When label=[]". Better to add queue name > so admin can identify the problematic queue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9463) Add queueName info when failing with queue capacity sanity check
[ https://issues.apache.org/jira/browse/YARN-9463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated YARN-9463: --- Attachment: YARN-9463.1.patch > Add queueName info when failing with queue capacity sanity check > > > Key: YARN-9463 > URL: https://issues.apache.org/jira/browse/YARN-9463 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 2.9.1 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Trivial > Attachments: YARN-9463.1.patch > > > In queue sanity check of CSQueueUtils.java , we are throwing "Illegal queue > capacity setting, (abs-capacity=0.00160782) > > (abs-maximum-capacity=0.0016027201). When label=[]". Better to add queue name > so admin can identify the problematic queue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable
[ https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-6929: Attachment: YARN-6929-008.patch > yarn.nodemanager.remote-app-log-dir structure is not scalable > - > > Key: YARN-6929 > URL: https://issues.apache.org/jira/browse/YARN-6929 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-6929-007.patch, YARN-6929-008.patch, > YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.2.patch, YARN-6929.3.patch, > YARN-6929.4.patch, YARN-6929.5.patch, YARN-6929.6.patch, YARN-6929.patch > > > The current directory structure for yarn.nodemanager.remote-app-log-dir is > not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). > With retention yarn.log-aggregation.retain-seconds of 7days, there are more > chances LogAggregationService fails to create a new directory with > FSLimitException$MaxDirectoryItemsExceededException. > The current structure is > //logs/. This can be > improved with adding date as a subdirectory like > //logs// > {code} > WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: > Application failed to init aggregation > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443) > > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): > The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 > items=1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021) > > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072) > > at > org.apache.hadoop.hdfs.server.namenod
[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try
[ https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813627#comment-16813627 ] Eric Yang commented on YARN-7848: - [~ebadger] [~Jim_Brennan] The failed unit test is not related to patch 003. Please review. thanks > Force removal of docker containers that do not get removed on first try > --- > > Key: YARN-7848 > URL: https://issues.apache.org/jira/browse/YARN-7848 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Yang >Priority: Major > Labels: Docker > Attachments: YARN-7848.001.patch, YARN-7848.002.patch, > YARN-7848.003.patch > > > After the addition of YARN-5366, containers will get removed after a certain > debug delay. However, this is a one-time effort. If the removal fails for > whatever reason, the container will persist. We need to add a mechanism for a > forced removal of those containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9439) Support asynchronized scheduling mode and multi-node lookup mechanism for app activities
[ https://issues.apache.org/jira/browse/YARN-9439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813531#comment-16813531 ] Hadoop QA commented on YARN-9439: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 8s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 15s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager in trunk has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 57s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 76m 51s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}126m 10s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9439 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12965321/YARN-9439.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 0f0017efe62a 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 73f43ac | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/23919/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-warnings.html | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23919/testReport/ | | Max. process+thread count | 886 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn
[jira] [Created] (YARN-9465) Support to alter table properties in HBaseTimelineSchemaCreator
Tarun Parimi created YARN-9465: -- Summary: Support to alter table properties in HBaseTimelineSchemaCreator Key: YARN-9465 URL: https://issues.apache.org/jira/browse/YARN-9465 Project: Hadoop YARN Issue Type: Sub-task Components: timelinereader Affects Versions: 3.2.0 Reporter: Tarun Parimi Assignee: Tarun Parimi HBaseTimelineSchemaCreator currently only creates tables if they don't exist. Only creating hbase tables without altering is the desired behavior for most of the use cases. However, in certain scenarios we might need to alter tables. For example after upgrade, we might need to point to a new coprocessor jar in yarn.timeline-service.hbase.coprocessor.jar.hdfs.location . A user might also want to change the ttl of the tables afterwards to control the data retention. Currently user has to manually find the tables related to atsv2 and alter them with hbase shell, which is not straightforward. To support such scenarios, it will be useful to have an option to alter table if required in HBaseTimelineSchemaCreator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9464) Support "Pending Resource" metrics in RM's RESTful API
[ https://issues.apache.org/jira/browse/YARN-9464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813460#comment-16813460 ] Prabhu Joseph commented on YARN-9464: - [~tangzhankun] Have updated pending resource for Cluster Metrics. I think it is not applicable for Node level metrics. > Support "Pending Resource" metrics in RM's RESTful API > -- > > Key: YARN-9464 > URL: https://issues.apache.org/jira/browse/YARN-9464 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9464-001.patch > > > Knowing only the "available", "used" resource is not enough for YARN > management tools like auto-scaler. It would be helpful to diagnose the > cluster resource utilization if it gets "Pending Resource" from RM RESTful > APIs. In a certain extent, it represents how starving the applications are. > Initially, we can add "pending resource" information in below two RM REST > APIs: > {code:java} > RMnode:port/ws/v1/cluster/metrics > RMnode:port/ws/v1/cluster/nodes > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7282) Shared Cache Phase 2
[ https://issues.apache.org/jira/browse/YARN-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813446#comment-16813446 ] Laurenceau Julien commented on YARN-7282: - Hi, Any chance to extend (in the long term) this Yarn cache to support data caching like Apache Ignite ? Regards > Shared Cache Phase 2 > > > Key: YARN-7282 > URL: https://issues.apache.org/jira/browse/YARN-7282 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chris Trezzo >Priority: Major > > Phase 2 will address more features that need to be built as part of the > shared cache project. See YARN-1492 for the first release of the shared cache. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9464) Support "Pending Resource" metrics in RM's RESTful API
[ https://issues.apache.org/jira/browse/YARN-9464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9464: Attachment: YARN-9464-001.patch > Support "Pending Resource" metrics in RM's RESTful API > -- > > Key: YARN-9464 > URL: https://issues.apache.org/jira/browse/YARN-9464 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9464-001.patch > > > Knowing only the "available", "used" resource is not enough for YARN > management tools like auto-scaler. It would be helpful to diagnose the > cluster resource utilization if it gets "Pending Resource" from RM RESTful > APIs. In a certain extent, it represents how starving the applications are. > Initially, we can add "pending resource" information in below two RM REST > APIs: > {code:java} > RMnode:port/ws/v1/cluster/metrics > RMnode:port/ws/v1/cluster/nodes > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9464) Support "Pending Resource" metrics in RM's RESTful API
[ https://issues.apache.org/jira/browse/YARN-9464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9464: Description: Knowing only the "available", "used" resource is not enough for YARN management tools like auto-scaler. It would be helpful to diagnose the cluster resource utilization if it gets "Pending Resource" from RM RESTful APIs. In a certain extent, it represents how starving the applications are. Initially, we can add "pending resource" information in below two RM REST APIs: {code:java} RMnode:port/ws/v1/cluster/metrics RMnode:port/ws/v1/cluster/nodes {code} was: Knowing only the "available", "used" resource is not enough for YARN management tools like auto-scaler. It would be helpful to diagnose the cluster resource utilization if it gets "Pending Resource" from RM RESTful APIs. In a certain extent, it represents how starving the applications are. Initially, we can add "pending resource" information in below two RM REST APIs: {code:java} RMnode:port/ws/v1/cluster/metrics RMnode:port/ws/v1/cluster/metrics/nodes {code} > Support "Pending Resource" metrics in RM's RESTful API > -- > > Key: YARN-9464 > URL: https://issues.apache.org/jira/browse/YARN-9464 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Prabhu Joseph >Priority: Major > > Knowing only the "available", "used" resource is not enough for YARN > management tools like auto-scaler. It would be helpful to diagnose the > cluster resource utilization if it gets "Pending Resource" from RM RESTful > APIs. In a certain extent, it represents how starving the applications are. > Initially, we can add "pending resource" information in below two RM REST > APIs: > {code:java} > RMnode:port/ws/v1/cluster/metrics > RMnode:port/ws/v1/cluster/nodes > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8920) LogAggregation should be configurable to allow writing to underlying storage as appOwner or yarn user
[ https://issues.apache.org/jira/browse/YARN-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813406#comment-16813406 ] Hadoop QA commented on YARN-8920: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} YARN-8920 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8920 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945250/YARN-8920.6.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23920/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > LogAggregation should be configurable to allow writing to underlying storage > as appOwner or yarn user > - > > Key: YARN-8920 > URL: https://issues.apache.org/jira/browse/YARN-8920 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, yarn >Reporter: Suma Shivaprasad >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-8920.1.patch, YARN-8920.2.patch, YARN-8920.3.patch, > YARN-8920.4.patch, YARN-8920.5.patch, YARN-8920.6.patch > > > Currently NM Log Aggregation does not support writing to underlying storage > as "yarn" user. This would be needed while writing storages like S3 which do > not support POSIX compliant ACLs and a single access key would be used for > writes and app owners will be allowed to read the logs with their own access > keys. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9433) Remove unused constants from RMAuditLogger
[ https://issues.apache.org/jira/browse/YARN-9433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813355#comment-16813355 ] Igor Rudenko commented on YARN-9433: Regarding to hadoop-yetus [analysis|https://github.com/apache/hadoop/pull/706#issuecomment-480943190] {{test4tests}}: unused code deleted no reason to add/delete unit any tests {{findbugs}}: 2 warnings aren't related to code changes {{unit}}: failures aren't caused by the fix {{asflicense}}: license warning is about of absence of license header but modified file has the license header > Remove unused constants from RMAuditLogger > -- > > Key: YARN-9433 > URL: https://issues.apache.org/jira/browse/YARN-9433 > Project: Hadoop YARN > Issue Type: Task > Components: yarn >Affects Versions: 3.2.0 >Reporter: Adam Antal >Priority: Minor > Labels: newbie > > There are some unused constants in RMAuditLogger that the IntelliJ warns you > about. > Currently what I'm seeing is that the following {{public static final > String}} constants are unused: > * AM_ALLOCATE > * CHANGE_CONTAINER_RESOURCE > * CREATE_NEW_RESERVATION_REQUEST > Probably they are no longer needed. This task aims to remove those unused > constants. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9102) Log Aggregation is failing with S3A FileSystem for IFile Format
[ https://issues.apache.org/jira/browse/YARN-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813330#comment-16813330 ] Adam Antal commented on YARN-9102: -- Could you please provide us with a full stack trace [~vamshikrishna.t]? > Log Aggregation is failing with S3A FileSystem for IFile Format > --- > > Key: YARN-9102 > URL: https://issues.apache.org/jira/browse/YARN-9102 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager, resourcemanager, yarn >Affects Versions: 3.1.1 >Reporter: VAMSHI KRISHNA >Priority: Major > > Log aggregation for application is failing in hadoop when we configure Index > file format with S3A as filesystem. In nodemanager logs, its showing > FileNotFoundException. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8920) LogAggregation should be configurable to allow writing to underlying storage as appOwner or yarn user
[ https://issues.apache.org/jira/browse/YARN-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813320#comment-16813320 ] Adam Antal commented on YARN-8920: -- Hi [~suma.shivaprasad], Are you still working on this issue? > LogAggregation should be configurable to allow writing to underlying storage > as appOwner or yarn user > - > > Key: YARN-8920 > URL: https://issues.apache.org/jira/browse/YARN-8920 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, yarn >Reporter: Suma Shivaprasad >Assignee: Suma Shivaprasad >Priority: Major > Attachments: YARN-8920.1.patch, YARN-8920.2.patch, YARN-8920.3.patch, > YARN-8920.4.patch, YARN-8920.5.patch, YARN-8920.6.patch > > > Currently NM Log Aggregation does not support writing to underlying storage > as "yarn" user. This would be needed while writing storages like S3 which do > not support POSIX compliant ACLs and a single access key would be used for > writes and app owners will be allowed to read the logs with their own access > keys. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9439) Support asynchronized scheduling mode and multi-node lookup mechanism for app activities
[ https://issues.apache.org/jira/browse/YARN-9439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813306#comment-16813306 ] Tao Yang commented on YARN-9439: Attached v3 patch to fix UT errors. > Support asynchronized scheduling mode and multi-node lookup mechanism for app > activities > > > Key: YARN-9439 > URL: https://issues.apache.org/jira/browse/YARN-9439 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9439.001.patch, YARN-9439.002.patch, > YARN-9439.003.patch > > > [Design > doc|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.m051gyiikx7c] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9439) Support asynchronized scheduling mode and multi-node lookup mechanism for app activities
[ https://issues.apache.org/jira/browse/YARN-9439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9439: --- Attachment: YARN-9439.003.patch > Support asynchronized scheduling mode and multi-node lookup mechanism for app > activities > > > Key: YARN-9439 > URL: https://issues.apache.org/jira/browse/YARN-9439 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9439.001.patch, YARN-9439.002.patch, > YARN-9439.003.patch > > > [Design > doc|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.m051gyiikx7c] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-6356) Allow different values of yarn.log-aggregation.retain-seconds for succeeded and failed jobs
[ https://issues.apache.org/jira/browse/YARN-6356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Antal reassigned YARN-6356: Assignee: Adam Antal > Allow different values of yarn.log-aggregation.retain-seconds for succeeded > and failed jobs > --- > > Key: YARN-6356 > URL: https://issues.apache.org/jira/browse/YARN-6356 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation >Reporter: Robert Kanter >Assignee: Adam Antal >Priority: Major > > It would be useful to have a value of {{yarn.log-aggregation.retain-seconds}} > for succeeded jobs and a different value for failed/killed jobs. For jobs > that succeeded, you typically don't care about the logs, so a shorter > retention time is fine (and saves space/blocks in HDFS). For jobs that > failed or were killed, the logs are much more important, and it's likely to > want to keep them around for longer so you have time to look at them. > For instance, you could set it to keep logs for succeeded jobs for 1 day and > logs for failed/killed jobs for 1 week. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9443) Fast RM Failover using Ratis (Raft protocol)
[ https://issues.apache.org/jira/browse/YARN-9443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813127#comment-16813127 ] Prabhu Joseph commented on YARN-9443: - [~adam.antal] Thanks for checking this one. Have planned to prepare a design doc, will update. > Fast RM Failover using Ratis (Raft protocol) > > > Key: YARN-9443 > URL: https://issues.apache.org/jira/browse/YARN-9443 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > > During Failover, the RM Standby will have a lag as it has to recover from > Zookeeper / FileSystem StateStore. RM HA using Ratis (Raft Protocol) can > achieve Fast failover as all RMs are in sync already. This is used by Ozone - > HDDS-505. > > cc [~nandakumar131] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-950) Ability to limit or avoid aggregating logs beyond a certain size
[ https://issues.apache.org/jira/browse/YARN-950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Antal reassigned YARN-950: --- Assignee: Adam Antal > Ability to limit or avoid aggregating logs beyond a certain size > > > Key: YARN-950 > URL: https://issues.apache.org/jira/browse/YARN-950 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation, nodemanager >Affects Versions: 0.23.9, 2.6.0 >Reporter: Jason Lowe >Assignee: Adam Antal >Priority: Major > > It would be nice if ops could configure a cluster such that any container log > beyond a configured size would either only have a portion of the log > aggregated or not aggregated at all. This would help speed up the recovery > path for cases where a container creates an enormous log and fills a disk, as > currently it tries to aggregate the entire, enormous log rather than only > aggregating a small portion or simply deleting it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9379) Can't specify docker runtime through environment
[ https://issues.apache.org/jira/browse/YARN-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813113#comment-16813113 ] caozhiqiang commented on YARN-9379: --- Hello, [~ebadger], hadoopqa show findbugs in NodeHealthCheckerService.java, but I don't make any change in this file. And findbugs version show v3.1.0-RC1. Could you give me some suggest? > Can't specify docker runtime through environment > > > Key: YARN-9379 > URL: https://issues.apache.org/jira/browse/YARN-9379 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.3.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Minor > Attachments: YARN-9379-branch-3.2.0.001.patch, YARN-9379.002.patch, > YARN-9379.003.patch, YARN-9379.004.patch > > > When use docker to run yarn containers, even though there is > docker.allowed.runtimes in container-executor.cfg, there are not parameter to > specify the docker runtime, such as gvisor, lxc or kata. With this patch, > client can add parameter such as > -[Dyarn.app.mapreduce.am|http://dyarn.app.mapreduce.am/].env.YARN_CONTAINER_RUNTIME_DOCKER_RUNTIME=runsc > to specify docker runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9379) Can't specify docker runtime through environment
[ https://issues.apache.org/jira/browse/YARN-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caozhiqiang updated YARN-9379: -- Target Version/s: (was: 3.2.1) > Can't specify docker runtime through environment > > > Key: YARN-9379 > URL: https://issues.apache.org/jira/browse/YARN-9379 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.3.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Minor > Attachments: YARN-9379-branch-3.2.0.001.patch, YARN-9379.002.patch, > YARN-9379.003.patch, YARN-9379.004.patch > > > When use docker to run yarn containers, even though there is > docker.allowed.runtimes in container-executor.cfg, there are not parameter to > specify the docker runtime, such as gvisor, lxc or kata. With this patch, > client can add parameter such as > -[Dyarn.app.mapreduce.am|http://dyarn.app.mapreduce.am/].env.YARN_CONTAINER_RUNTIME_DOCKER_RUNTIME=runsc > to specify docker runtime. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9379) Can't specify docker runtime through environment
[ https://issues.apache.org/jira/browse/YARN-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813063#comment-16813063 ] Hadoop QA commented on YARN-9379: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 33s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 58s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 54s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 56s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 71m 13s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9379 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12965271/YARN-9379.004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 45cc0766f0d8 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2d4f6b6 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-YARN-Build/23918/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23918/testReport/ | | Max. process+thread count | 420 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-