[jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable

2019-04-11 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815996#comment-16815996
 ] 

Prabhu Joseph commented on YARN-6929:
-

This looks good to me. Will change the code accordingly. Thanks.

> yarn.nodemanager.remote-app-log-dir structure is not scalable
> -
>
> Key: YARN-6929
> URL: https://issues.apache.org/jira/browse/YARN-6929
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-6929-007.patch, YARN-6929-008.patch, 
> YARN-6929-009.patch, YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.2.patch, 
> YARN-6929.3.patch, YARN-6929.4.patch, YARN-6929.5.patch, YARN-6929.6.patch, 
> YARN-6929.patch
>
>
> The current directory structure for yarn.nodemanager.remote-app-log-dir is 
> not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). 
> With retention yarn.log-aggregation.retain-seconds of 7days, there are more 
> chances LogAggregationService fails to create a new directory with 
> FSLimitException$MaxDirectoryItemsExceededException.
> The current structure is 
> //logs/. This can be 
> improved with adding date as a subdirectory like 
> //logs// 
> {code}
> WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  Application failed to init aggregation 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)
>  
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)
>  
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>  
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:415) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) 
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)
>  
> at 
> 

[jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable

2019-04-11 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815991#comment-16815991
 ] 

Eric Yang commented on YARN-6929:
-

{quote}Looks cluster_timestamp is missed, it is also required as the replaying 
of app IDs happens after the RM restart.{quote}

It was intentionally omitted because appid is already containing the same 
cluster timestamp.  I don't think it's necessary to repeat the same information 
twice.  How about?

{code}{aggregation_log_root}/{user}/{suffix}/{bucket1}/{appId}{code}

> yarn.nodemanager.remote-app-log-dir structure is not scalable
> -
>
> Key: YARN-6929
> URL: https://issues.apache.org/jira/browse/YARN-6929
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-6929-007.patch, YARN-6929-008.patch, 
> YARN-6929-009.patch, YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.2.patch, 
> YARN-6929.3.patch, YARN-6929.4.patch, YARN-6929.5.patch, YARN-6929.6.patch, 
> YARN-6929.patch
>
>
> The current directory structure for yarn.nodemanager.remote-app-log-dir is 
> not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). 
> With retention yarn.log-aggregation.retain-seconds of 7days, there are more 
> chances LogAggregationService fails to create a new directory with 
> FSLimitException$MaxDirectoryItemsExceededException.
> The current structure is 
> //logs/. This can be 
> improved with adding date as a subdirectory like 
> //logs// 
> {code}
> WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  Application failed to init aggregation 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)
>  
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)
>  
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>  
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:415) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) 
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: 
> 

[jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable

2019-04-11 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815939#comment-16815939
 ] 

Prabhu Joseph commented on YARN-6929:
-

[~eyang] Thanks for reviewing again. Have few doubts with new structure

{noformat} {aggregation_log_root}/{user}/{bucket1}_{suffix}/{appId} {noformat}

1. The user directory has suffixes directory like logs, logs-ifile to separate 
the log file format - ifile or tfile. The logfile reader checks both suffix 
directory to read using the right format. In case of above structure, buckets 
also created
at same place and a bucket will repeat for each file format like below. Not an 
issue but does not look like a clear separation.

{code}
/app-logs/ambari-qa/logs
/app-logs/ambari-qa/logs-ifile
/app-logs/ambari-qa/1234_logs
/app-logs/ambari-qa/1235_logs
...
/app-logs/ambari-qa/1234_logsifile
/app-logs/ambari-qa/1235_logsifile

{code}

2. Looks cluster_timestamp is missed, it is also required as the replaying of 
app IDs happens after the RM restart.

Does the below one looks good - if you are fine with having a separate suffix 
for newer structure.

{code}

{aggregation_log_root}/{user}/bucket_{suffix}/{cluster_timestamp}/{bucket1}/{appId}

where suffix is logs, logs-ifile
   bucket1 is application#getId % 1
{code}


> yarn.nodemanager.remote-app-log-dir structure is not scalable
> -
>
> Key: YARN-6929
> URL: https://issues.apache.org/jira/browse/YARN-6929
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-6929-007.patch, YARN-6929-008.patch, 
> YARN-6929-009.patch, YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.2.patch, 
> YARN-6929.3.patch, YARN-6929.4.patch, YARN-6929.5.patch, YARN-6929.6.patch, 
> YARN-6929.patch
>
>
> The current directory structure for yarn.nodemanager.remote-app-log-dir is 
> not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). 
> With retention yarn.log-aggregation.retain-seconds of 7days, there are more 
> chances LogAggregationService fails to create a new directory with 
> FSLimitException$MaxDirectoryItemsExceededException.
> The current structure is 
> //logs/. This can be 
> improved with adding date as a subdirectory like 
> //logs// 
> {code}
> WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  Application failed to init aggregation 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)
>  
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)
>  
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>  
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:415) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) 
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308)
>  
> at 
> 

[jira] [Commented] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable

2019-04-11 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815815#comment-16815815
 ] 

Eric Yang commented on YARN-6929:
-

[~Prabhu Joseph] {code}{aggregation_log_root} /  / bucket_{suffix} / 
{cluster_timestamp} / {bucket1} / {bucket2} / {appId}{code}

I think this structure can be normalized a bit for usability.  bucket_suffix is 
static information that we can do without. cluster_timestamp, bucket2 and appId 
are mostly the same.  We only need bucket1 before cluster_timestamp to spread 
log files into many partitions.  We only need one from cluster_timestamp or 
bucket2 because they are redundant information.  The easier to use format looks 
like:

{code} {aggregation_log_root}//{bucket1}_{suffix}/{appId} {code}

This will give 640,000,000 usable space.  I think this is good enough per user.

> yarn.nodemanager.remote-app-log-dir structure is not scalable
> -
>
> Key: YARN-6929
> URL: https://issues.apache.org/jira/browse/YARN-6929
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-6929-007.patch, YARN-6929-008.patch, 
> YARN-6929-009.patch, YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.2.patch, 
> YARN-6929.3.patch, YARN-6929.4.patch, YARN-6929.5.patch, YARN-6929.6.patch, 
> YARN-6929.patch
>
>
> The current directory structure for yarn.nodemanager.remote-app-log-dir is 
> not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). 
> With retention yarn.log-aggregation.retain-seconds of 7days, there are more 
> chances LogAggregationService fails to create a new directory with 
> FSLimitException$MaxDirectoryItemsExceededException.
> The current structure is 
> //logs/. This can be 
> improved with adding date as a subdirectory like 
> //logs// 
> {code}
> WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  Application failed to init aggregation 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)
>  
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)
>  
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>  
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:415) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) 
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
> 

[jira] [Updated] (YARN-9435) Add Opportunistic Scheduler metrics in ResourceManager.

2019-04-11 Thread Giovanni Matteo Fumarola (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-9435:
---
Fix Version/s: 3.3.0

> Add Opportunistic Scheduler metrics in ResourceManager.
> ---
>
> Key: YARN-9435
> URL: https://issues.apache.org/jira/browse/YARN-9435
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9435.001.patch, YARN-9435.002.patch, 
> YARN-9435.003.patch, YARN-9435.004.patch
>
>
> Right now there are no metrics available for Opportunistic Scheduler at 
> ResourceManager. As part of this jira, we will add metrics like number of 
> allocated opportunistic containers, released opportunistic containers, node 
> level allocations, rack level allocations etc. for Opportunistic Scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9435) Add Opportunistic Scheduler metrics in ResourceManager.

2019-04-11 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815713#comment-16815713
 ] 

Giovanni Matteo Fumarola commented on YARN-9435:


Thanks [~abmodi] for the patch.
Committed to trunk.

> Add Opportunistic Scheduler metrics in ResourceManager.
> ---
>
> Key: YARN-9435
> URL: https://issues.apache.org/jira/browse/YARN-9435
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9435.001.patch, YARN-9435.002.patch, 
> YARN-9435.003.patch, YARN-9435.004.patch
>
>
> Right now there are no metrics available for Opportunistic Scheduler at 
> ResourceManager. As part of this jira, we will add metrics like number of 
> allocated opportunistic containers, released opportunistic containers, node 
> level allocations, rack level allocations etc. for Opportunistic Scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9435) Add Opportunistic Scheduler metrics in ResourceManager.

2019-04-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815711#comment-16815711
 ] 

Hudson commented on YARN-9435:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16384 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16384/])
YARN-9435. Add Opportunistic Scheduler metrics in ResourceManager. (gifuma: rev 
ed3747c1ccc303e206de50c2b74f3c318cb1c416)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/OpportunisticContainerAllocatorAMService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/scheduler/OpportunisticContainerAllocator.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestOpportunisticContainerAllocatorAMService.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/OpportunisticSchedulerMetrics.java


> Add Opportunistic Scheduler metrics in ResourceManager.
> ---
>
> Key: YARN-9435
> URL: https://issues.apache.org/jira/browse/YARN-9435
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9435.001.patch, YARN-9435.002.patch, 
> YARN-9435.003.patch, YARN-9435.004.patch
>
>
> Right now there are no metrics available for Opportunistic Scheduler at 
> ResourceManager. As part of this jira, we will add metrics like number of 
> allocated opportunistic containers, released opportunistic containers, node 
> level allocations, rack level allocations etc. for Opportunistic Scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try

2019-04-11 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815709#comment-16815709
 ] 

Eric Yang commented on YARN-7848:
-

[~Jim_Brennan] Thank you for tips.  [~ebadger] can you commit the patch, if it 
looks good to you?  Thanks

> Force removal of docker containers that do not get removed on first try
> ---
>
> Key: YARN-7848
> URL: https://issues.apache.org/jira/browse/YARN-7848
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-7848.001.patch, YARN-7848.002.patch, 
> YARN-7848.003.patch, YARN-7848.004.patch
>
>
> After the addition of YARN-5366, containers will get removed after a certain 
> debug delay. However, this is a one-time effort. If the removal fails for 
> whatever reason, the container will persist. We need to add a mechanism for a 
> forced removal of those containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9448) Fix Opportunistic Scheduling for node local allocations.

2019-04-11 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815652#comment-16815652
 ] 

Íñigo Goiri commented on YARN-9448:
---

Thanks [~abmodi], what about extracting the {{allocationRequestId}} and using 
them in the assertions?
This may also improve readability by splitting ids by comments.

> Fix Opportunistic Scheduling for node local allocations.
> 
>
> Key: YARN-9448
> URL: https://issues.apache.org/jira/browse/YARN-9448
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9448.001.patch, YARN-9448.002.patch
>
>
> Right now, opportunistic container might not get allocated on rack local node 
> even if it's available.
> Nodes are right now blacklisted if any container except node local container 
> is allocated on that node. In case, if previously container was allocated on 
> that node, that node wouldn't be even considered even if there is an ask for 
> node local request. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2889) Limit the number of opportunistic container allocated per AM heartbeat

2019-04-11 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815648#comment-16815648
 ] 

Íñigo Goiri commented on YARN-2889:
---

Thanks [~abmodi] for the patch.
A few comments:
* Can we use "opportunistic containers" in {{yarn-default.xml}}?
* The main logic seems to be in {{OpportunisticContainerAllocator}}, it could 
use some logging (both info and debug) to understand why stuff is getting 
capped.
* In {{TestOpportunisticContainerAllocator}}, we could extract 
{{ExecutionTypeRequest.newInstance(ExecutionType.OPPORTUNISTIC, true))}} as 
well as {{Priority.newInstance(1)}} into a constant.
* The tests in {{TestOpportunisticContainerAllocator}} could use some comments 
and probably even a javadoc explaining what is expected.
* In {{TestOpportunisticContainerAllocator}}, the System.out should be a LOG 
(together with the comments).

> Limit the number of opportunistic container allocated per AM heartbeat
> --
>
> Key: YARN-2889
> URL: https://issues.apache.org/jira/browse/YARN-2889
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-2889.001.patch, YARN-2889.002.patch
>
>
> We introduce a way to limit the number of opportunistic containers that will 
> be allocated on each AM heartbeat.
>  This way we can restrict the number of opportunistic containers handed out 
> by the system, as well as throttle down misbehaving AMs (asking for too many 
> opportunistic containers).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9339) Apps pending metric incorrect after moving app to a new queue

2019-04-11 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815646#comment-16815646
 ] 

Abhishek Modi commented on YARN-9339:
-

Thanks [~elgoiri] for review. I will address the comments in next patch.

> Apps pending metric incorrect after moving app to a new queue
> -
>
> Key: YARN-9339
> URL: https://issues.apache.org/jira/browse/YARN-9339
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Abhishek Modi
>Priority: Minor
> Attachments: YARN-9339.001.patch, YARN-9339.002.patch
>
>
> I observed a cluster that had a high Apps Pending count that appeared to be 
> incorrect. This seemed to be related to apps being moved to different queues. 
> I tested by adding some logging to TestCapacityScheduler#testMoveAppBasic 
> before and after a moveApplication call. Before the call appsPending was 1 
> and afterwards appsPending was 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9052) Replace all MockRM submit method definitions with a builder

2019-04-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815645#comment-16815645
 ] 

Hadoop QA commented on YARN-9052:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
44s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 81 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
38s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 25m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  4m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
21m 53s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
45s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 in trunk has 2 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
14s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
45s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
30s{color} | {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
35s{color} | {color:red} hadoop-mapreduce-client-app in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  8m 
27s{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  8m 27s{color} 
| {color:red} root in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
4m 30s{color} | {color:orange} root: The patch generated 119 new + 1841 
unchanged - 37 fixed = 1960 total (was 1878) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
54s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
36s{color} | {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
39s{color} | {color:red} hadoop-mapreduce-client-app in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  3m  
1s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
41s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch 
failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
29s{color} | {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
33s{color} | {color:red} hadoop-mapreduce-client-app in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 51s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 33s{color} 
| {color:red} hadoop-yarn-client in the 

[jira] [Commented] (YARN-9339) Apps pending metric incorrect after moving app to a new queue

2019-04-11 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815641#comment-16815641
 ] 

Íñigo Goiri commented on YARN-9339:
---

Thanks [~abmodi] for the patch.
A couple comments:
* Why is ParentQueue#submitApplicationAttempt() empty? We should have a better 
comment here.
* Can we have extended unit tests? Right now we are only checking for 1 app and 
we don't check before for 0.

> Apps pending metric incorrect after moving app to a new queue
> -
>
> Key: YARN-9339
> URL: https://issues.apache.org/jira/browse/YARN-9339
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Abhishek Modi
>Priority: Minor
> Attachments: YARN-9339.001.patch, YARN-9339.002.patch
>
>
> I observed a cluster that had a high Apps Pending count that appeared to be 
> incorrect. This seemed to be related to apps being moved to different queues. 
> I tested by adding some logging to TestCapacityScheduler#testMoveAppBasic 
> before and after a moveApplication call. Before the call appsPending was 1 
> and afterwards appsPending was 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9462) TestResourceTrackerService.testNodeRemovalGracefully fails sporadically

2019-04-11 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815623#comment-16815623
 ] 

Giovanni Matteo Fumarola commented on YARN-9462:


Thanks [~Prabhu Joseph] for the patch.
I am not a fan of increasing time values to avoid tests to fail because it 
hides the real problem.

Please run the test multiple times to verify your fix. e.g with a use of a bash 
script, with intelliJ or with Repeat annotation.

> TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
> ---
>
> Key: YARN-9462
> URL: https://issues.apache.org/jira/browse/YARN-9462
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, test
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: 
> TestResourceTrackerService.testNodeRemovalGracefully.txt, YARN-9462-001.patch
>
>
> TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
> {code}
> [ERROR] 
> testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
>   Time elapsed: 3.385 s  <<< FAILURE!
> java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but 
> was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtilDecomToUntracked(TestResourceTrackerService.java:2318)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtil(TestResourceTrackerService.java:2280)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalGracefully(TestResourceTrackerService.java:2133)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7537) [Atsv2] load hbase configuration from filesystem rather than URL

2019-04-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815581#comment-16815581
 ] 

Hadoop QA commented on YARN-7537:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} pathlen {color} | {color:red}  0m  
0s{color} | {color:red} The patch appears to contain 1 files with names longer 
than 240 {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 37s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 42s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
26s{color} | {color:green} hadoop-yarn-server-timelineservice-hbase-client in 
the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 56m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-7537 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12965610/YARN-7537-03.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d16ca1f1abb2 4.4.0-144-generic #170~14.04.1-Ubuntu SMP Mon Mar 
18 15:02:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a0468c5 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| pathlen | 
https://builds.apache.org/job/PreCommit-YARN-Build/23940/artifact/out/pathlen.txt
 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23940/testReport/ |
| Max. process+thread count | 310 (vs. ulimit of 1) |
| modules | C: 

[jira] [Commented] (YARN-9337) GPU auto-discovery script runs even when the resource is given by hand

2019-04-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815568#comment-16815568
 ] 

Hadoop QA commented on YARN-9337:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
55s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 in trunk has 2 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 23s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 2 new + 3 unchanged - 0 fixed = 5 total (was 3) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 52s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 43s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 71m  1s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9337 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12965607/YARN-9337.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4686a4753a11 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a0468c5 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/23938/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html
 |
| checkstyle | 

[jira] [Commented] (YARN-9453) Clean up code long if-else chain in ApplicationCLI#run

2019-04-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815543#comment-16815543
 ] 

Hadoop QA commented on YARN-9453:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 13s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: The patch generated 3 new + 
12 unchanged - 1 fixed = 15 total (was 13) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 26 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  6s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 25m 10s{color} 
| {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 70m 48s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.client.cli.TestYarnCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9453 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12965601/YARN-9453.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 81d351321270 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a0468c5 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/23937/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
 |
| whitespace | 

[jira] [Commented] (YARN-7537) [Atsv2] load hbase configuration from filesystem rather than URL

2019-04-11 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815513#comment-16815513
 ] 

Prabhu Joseph commented on YARN-7537:
-

Have used {{FileSystem.newInstance(uri, conf)}} which will return FileSystem 
based on the scheme of the uri path. Tested with file, hdfs and s3 filesystem.

> [Atsv2] load hbase configuration from filesystem rather than URL
> 
>
> Key: YARN-7537
> URL: https://issues.apache.org/jira/browse/YARN-7537
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-7537-03.patch, YARN-7537.01.patch, 
> YARN-7537.02.patch
>
>
> Currently HBaseTimelineStorageUtils#getTimelineServiceHBaseConf loads hbase 
> configurations using URL if *yarn.timeline-service.hbase.configuration.file* 
> is configured. But it is restricted to URLs only. This need to be changed to 
> load from file system. In deployment, hbase configuration can be kept under 
> filesystem so that it be utilized by all the NodeManager and ResourceManager.
> cc :/ [~vrushalic] [~varun_saxena]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7537) [Atsv2] load hbase configuration from filesystem rather than URL

2019-04-11 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-7537:

Attachment: YARN-7537-03.patch

> [Atsv2] load hbase configuration from filesystem rather than URL
> 
>
> Key: YARN-7537
> URL: https://issues.apache.org/jira/browse/YARN-7537
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-7537-03.patch, YARN-7537.01.patch, 
> YARN-7537.02.patch
>
>
> Currently HBaseTimelineStorageUtils#getTimelineServiceHBaseConf loads hbase 
> configurations using URL if *yarn.timeline-service.hbase.configuration.file* 
> is configured. But it is restricted to URLs only. This need to be changed to 
> load from file system. In deployment, hbase configuration can be kept under 
> filesystem so that it be utilized by all the NodeManager and ResourceManager.
> cc :/ [~vrushalic] [~varun_saxena]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9337) GPU auto-discovery script runs even when the resource is given by hand

2019-04-11 Thread Adam Antal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-9337:
-
Attachment: YARN-9337.003.patch

> GPU auto-discovery script runs even when the resource is given by hand
> --
>
> Key: YARN-9337
> URL: https://issues.apache.org/jira/browse/YARN-9337
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9337.001.patch, YARN-9337.002.patch, 
> YARN-9337.003.patch
>
>
> The nvidia-smi script is called even when the gpu configs are given by hand 
> (so there's no need for GPU auto-discovery).
> We should mitigate the call of that script, since it has no effect. (The 
> configs written by the user is not overwritten by the result of the 
> auto-discovery script.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9052) Replace all MockRM submit method definitions with a builder

2019-04-11 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9052:
-
Attachment: YARN-9052.003.patch

> Replace all MockRM submit method definitions with a builder
> ---
>
> Key: YARN-9052
> URL: https://issues.apache.org/jira/browse/YARN-9052
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-9052.001.patch, YARN-9052.002.patch, 
> YARN-9052.003.patch
>
>
> MockRM has 31 definitions of submitApp, most of them having more than 
> acceptable number of parameters, ranging from 2 to even 22 parameters, which 
> makes the code completely unreadable.
> On top of unreadability, it's very hard to follow what RmApp will be produced 
> for tests as they often pass a lot of empty / null values as parameters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9337) GPU auto-discovery script runs even when the resource is given by hand

2019-04-11 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815486#comment-16815486
 ] 

Adam Antal commented on YARN-9337:
--

Accidentally pushed a wrong patch, uploaded the correct one as v3.

> GPU auto-discovery script runs even when the resource is given by hand
> --
>
> Key: YARN-9337
> URL: https://issues.apache.org/jira/browse/YARN-9337
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9337.001.patch, YARN-9337.002.patch
>
>
> The nvidia-smi script is called even when the gpu configs are given by hand 
> (so there's no need for GPU auto-discovery).
> We should mitigate the call of that script, since it has no effect. (The 
> configs written by the user is not overwritten by the result of the 
> auto-discovery script.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9337) GPU auto-discovery script runs even when the resource is given by hand

2019-04-11 Thread Adam Antal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-9337:
-
Attachment: YARN-9337.002.patch

> GPU auto-discovery script runs even when the resource is given by hand
> --
>
> Key: YARN-9337
> URL: https://issues.apache.org/jira/browse/YARN-9337
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9337.001.patch, YARN-9337.002.patch
>
>
> The nvidia-smi script is called even when the gpu configs are given by hand 
> (so there's no need for GPU auto-discovery).
> We should mitigate the call of that script, since it has no effect. (The 
> configs written by the user is not overwritten by the result of the 
> auto-discovery script.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9430) Recovering containers does not check available resources on node

2019-04-11 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815477#comment-16815477
 ] 

Szilard Nemeth commented on YARN-9430:
--

Nope, I was not assigning this to myself on purpose, anyone can pick it up!

> Recovering containers does not check available resources on node
> 
>
> Key: YARN-9430
> URL: https://issues.apache.org/jira/browse/YARN-9430
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Priority: Critical
>
> I have a testcase that checks if some GPU devices gone offline and recovery 
> happens, only the containers that fit into the node's resources will be 
> recovered. Unfortunately, this is not the case: RM does not check available 
> resources on node during recovery.
> *Detailed explanation:*
> *Testcase:* 
>  1. There are 2 nodes running NodeManagers
>  2. nvidia-smi is replaced with a fake bash script that reports 2 GPU devices 
> per node, initially. This means 4 GPU devices in the cluster altogether.
>  3. RM / NM recovery is enabled
>  4. The test starts off a sleep job, requesting 4 containers, 1 GPU device 
> for each (AM does not request GPUs)
>  5. Before restart, the fake bash script is adjusted to report 1 GPU device 
> per node (2 in the cluster) after restart.
>  6. Restart is initiated.
>  
> *Expected behavior:* 
>  After restart, only the AM and 2 normal containers should have been started, 
> as there are only 2 GPU devices in the cluster.
>  
> *Actual behaviour:* 
>  AM + 4 containers are allocated, this is all containers started originally 
> with step 4.
> App id was: 1553977186701_0001
> *Logs*:
>  
> {code:java}
> 2019-03-30 13:22:30,299 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Processing event for appattempt_1553977186701_0001_01 of type RECOVER
> 2019-03-30 13:22:30,366 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Added Application Attempt appattempt_1553977186701_0001_01 to scheduler 
> from user: systest
>  2019-03-30 13:22:30,366 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> appattempt_1553977186701_0001_01 is recovering. Skipping notifying 
> ATTEMPT_ADDED
>  2019-03-30 13:22:30,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1553977186701_0001_01 State change from NEW to LAUNCHED on 
> event = RECOVER
> 2019-03-30 13:22:33,257 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
>  Recovering container [container_e84_1553977186701_0001_01_01, 
> CreateTime: 1553977260732, Version: 0, State: RUNNING, Capability: 
> , Diagnostics: , ExitStatus: -1000, 
> NodeLabelExpression: Priority: 0]
> 2019-03-30 13:22:33,275 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
>  Recovering container [container_e84_1553977186701_0001_01_04, 
> CreateTime: 1553977272802, Version: 0, State: RUNNING, Capability: 
> , Diagnostics: , ExitStatus: -1000, 
> NodeLabelExpression: Priority: 0]
> 2019-03-30 13:22:33,275 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: 
> Assigned container container_e84_1553977186701_0001_01_04 of capacity 
>  on host 
> snemeth-gpu-2.vpc.cloudera.com:8041, which has 2 containers,  vCores:2, yarn.io/gpu: 1> used and  available after 
> allocation
> 2019-03-30 13:22:33,276 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
>  Recovering container [container_e84_1553977186701_0001_01_05, 
> CreateTime: 1553977272803, Version: 0, State: RUNNING, Capability: 
> , Diagnostics: , ExitStatus: -1000, 
> NodeLabelExpression: Priority: 0]
>  2019-03-30 13:22:33,276 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Processing container_e84_1553977186701_0001_01_05 of type RECOVER
>  2019-03-30 13:22:33,276 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e84_1553977186701_0001_01_05 Container Transitioned from NEW to 
> RUNNING
>  2019-03-30 13:22:33,276 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: 
> Assigned container container_e84_1553977186701_0001_01_05 of capacity 
>  on host 
> snemeth-gpu-2.vpc.cloudera.com:8041, which has 3 containers,  vCores:3, yarn.io/gpu: 2> used and  
> available after allocation
> 2019-03-30 13:22:33,279 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
>  Recovering container [container_e84_1553977186701_0001_01_03, 
> CreateTime: 1553977272166, Version: 0, State: RUNNING, Capability: 
> , Diagnostics: , ExitStatus: -1000, 
> 

[jira] [Updated] (YARN-9473) [Umbrella] Support Vector Engine ( a new accelerator hardware) based on pluggable device framework

2019-04-11 Thread Zhankun Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-9473:
---
Summary: [Umbrella] Support Vector Engine ( a new accelerator hardware) 
based on pluggable device framework  (was: [Umbrella] Support Vector Engine ( a 
new acceleration hardware) based on pluggable device framework)

> [Umbrella] Support Vector Engine ( a new accelerator hardware) based on 
> pluggable device framework
> --
>
> Key: YARN-9473
> URL: https://issues.apache.org/jira/browse/YARN-9473
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Zhankun Tang
>Priority: Major
>
> As the heterogeneous computation trend rises, new acceleration hardware like 
> GPU, FPGA is used to satisfy various requirements.
> And a new hardware Vector Engine (VE) which released by NEC company is 
> another example. The VE is like GPU but has different characteristics. It's 
> suitable for machine learning and HPC due to better memory bandwidth and no 
> PCIe bottleneck.
> Please Check here for more VE details:
> [https://www.nextplatform.com/2017/11/22/deep-dive-necs-aurora-vector-engine/]
> [https://www.hotchips.org/hc30/2conf/2.14_NEC_vector_NEC_SXAurora_TSUBASA_HotChips30_finalb.pdf]
> As we know, YARN-8851 is a pluggable device framework which provides an easy 
> way to develop a plugin for such new accelerators. This JIRA proposes to 
> develop a new VE plugin based on that framework and be implemented as current 
> GPU's "NvidiaGPUPluginForRuntimeV2" plugin.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9473) [Umbrella] Support Vector Engine ( a new acceleration hardware) based on pluggable device framework

2019-04-11 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9473:
--

 Summary: [Umbrella] Support Vector Engine ( a new acceleration 
hardware) based on pluggable device framework
 Key: YARN-9473
 URL: https://issues.apache.org/jira/browse/YARN-9473
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Zhankun Tang


As the heterogeneous computation trend rises, new acceleration hardware like 
GPU, FPGA is used to satisfy various requirements.

And a new hardware Vector Engine (VE) which released by NEC company is another 
example. The VE is like GPU but has different characteristics. It's suitable 
for machine learning and HPC due to better memory bandwidth and no PCIe 
bottleneck.

Please Check here for more VE details:

[https://www.nextplatform.com/2017/11/22/deep-dive-necs-aurora-vector-engine/]

[https://www.hotchips.org/hc30/2conf/2.14_NEC_vector_NEC_SXAurora_TSUBASA_HotChips30_finalb.pdf]

As we know, YARN-8851 is a pluggable device framework which provides an easy 
way to develop a plugin for such new accelerators. This JIRA proposes to 
develop a new VE plugin based on that framework and be implemented as current 
GPU's "NvidiaGPUPluginForRuntimeV2" plugin.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7848) Force removal of docker containers that do not get removed on first try

2019-04-11 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815434#comment-16815434
 ] 

Jim Brennan commented on YARN-7848:
---

Thanks for updating [~eyang]!  lgtm.   I am +1 on patch 004 (non-binding). 

> Force removal of docker containers that do not get removed on first try
> ---
>
> Key: YARN-7848
> URL: https://issues.apache.org/jira/browse/YARN-7848
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-7848.001.patch, YARN-7848.002.patch, 
> YARN-7848.003.patch, YARN-7848.004.patch
>
>
> After the addition of YARN-5366, containers will get removed after a certain 
> debug delay. However, this is a one-time effort. If the removal fails for 
> whatever reason, the container will persist. We need to add a mechanism for a 
> forced removal of those containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9430) Recovering containers does not check available resources on node

2019-04-11 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815422#comment-16815422
 ] 

Adam Antal commented on YARN-9430:
--

Thanks [~snemeth] for the recap.

I have nothing more to add. I suppose it should be an exception, but not sure 
if it should block RM startup. Let's make the check with exceptions thrown, and 
testcases, and we can discuss it later. Are you planning to work on it? In that 
case feel free to assign it yourself.

> Recovering containers does not check available resources on node
> 
>
> Key: YARN-9430
> URL: https://issues.apache.org/jira/browse/YARN-9430
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Priority: Critical
>
> I have a testcase that checks if some GPU devices gone offline and recovery 
> happens, only the containers that fit into the node's resources will be 
> recovered. Unfortunately, this is not the case: RM does not check available 
> resources on node during recovery.
> *Detailed explanation:*
> *Testcase:* 
>  1. There are 2 nodes running NodeManagers
>  2. nvidia-smi is replaced with a fake bash script that reports 2 GPU devices 
> per node, initially. This means 4 GPU devices in the cluster altogether.
>  3. RM / NM recovery is enabled
>  4. The test starts off a sleep job, requesting 4 containers, 1 GPU device 
> for each (AM does not request GPUs)
>  5. Before restart, the fake bash script is adjusted to report 1 GPU device 
> per node (2 in the cluster) after restart.
>  6. Restart is initiated.
>  
> *Expected behavior:* 
>  After restart, only the AM and 2 normal containers should have been started, 
> as there are only 2 GPU devices in the cluster.
>  
> *Actual behaviour:* 
>  AM + 4 containers are allocated, this is all containers started originally 
> with step 4.
> App id was: 1553977186701_0001
> *Logs*:
>  
> {code:java}
> 2019-03-30 13:22:30,299 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Processing event for appattempt_1553977186701_0001_01 of type RECOVER
> 2019-03-30 13:22:30,366 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Added Application Attempt appattempt_1553977186701_0001_01 to scheduler 
> from user: systest
>  2019-03-30 13:22:30,366 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> appattempt_1553977186701_0001_01 is recovering. Skipping notifying 
> ATTEMPT_ADDED
>  2019-03-30 13:22:30,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1553977186701_0001_01 State change from NEW to LAUNCHED on 
> event = RECOVER
> 2019-03-30 13:22:33,257 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
>  Recovering container [container_e84_1553977186701_0001_01_01, 
> CreateTime: 1553977260732, Version: 0, State: RUNNING, Capability: 
> , Diagnostics: , ExitStatus: -1000, 
> NodeLabelExpression: Priority: 0]
> 2019-03-30 13:22:33,275 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
>  Recovering container [container_e84_1553977186701_0001_01_04, 
> CreateTime: 1553977272802, Version: 0, State: RUNNING, Capability: 
> , Diagnostics: , ExitStatus: -1000, 
> NodeLabelExpression: Priority: 0]
> 2019-03-30 13:22:33,275 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: 
> Assigned container container_e84_1553977186701_0001_01_04 of capacity 
>  on host 
> snemeth-gpu-2.vpc.cloudera.com:8041, which has 2 containers,  vCores:2, yarn.io/gpu: 1> used and  available after 
> allocation
> 2019-03-30 13:22:33,276 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
>  Recovering container [container_e84_1553977186701_0001_01_05, 
> CreateTime: 1553977272803, Version: 0, State: RUNNING, Capability: 
> , Diagnostics: , ExitStatus: -1000, 
> NodeLabelExpression: Priority: 0]
>  2019-03-30 13:22:33,276 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Processing container_e84_1553977186701_0001_01_05 of type RECOVER
>  2019-03-30 13:22:33,276 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e84_1553977186701_0001_01_05 Container Transitioned from NEW to 
> RUNNING
>  2019-03-30 13:22:33,276 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: 
> Assigned container container_e84_1553977186701_0001_01_05 of capacity 
>  on host 
> snemeth-gpu-2.vpc.cloudera.com:8041, which has 3 containers,  vCores:3, yarn.io/gpu: 2> used and  
> available after allocation
> 2019-03-30 13:22:33,279 INFO 
> 

[jira] [Commented] (YARN-9462) TestResourceTrackerService.testNodeRemovalGracefully fails sporadically

2019-04-11 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815413#comment-16815413
 ] 

Prabhu Joseph commented on YARN-9462:
-

[~giovanni.fumarola]  Could you please review this if you have some time? This 
fixes TestResourceTrackerService.testNodeRemovalGracefully failing 
intermittently.

> TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
> ---
>
> Key: YARN-9462
> URL: https://issues.apache.org/jira/browse/YARN-9462
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, test
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: 
> TestResourceTrackerService.testNodeRemovalGracefully.txt, YARN-9462-001.patch
>
>
> TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
> {code}
> [ERROR] 
> testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
>   Time elapsed: 3.385 s  <<< FAILURE!
> java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but 
> was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtilDecomToUntracked(TestResourceTrackerService.java:2318)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtil(TestResourceTrackerService.java:2280)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalGracefully(TestResourceTrackerService.java:2133)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9462) TestResourceTrackerService.testNodeRemovalGracefully fails sporadically

2019-04-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815406#comment-16815406
 ] 

Hadoop QA commented on YARN-9462:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
14s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 in trunk has 2 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 79m  
2s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}128m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9462 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12965583/YARN-9462-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux b3dc280402c0 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bdbca0e |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/23936/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-warnings.html
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23936/testReport/ |
| Max. process+thread count | 904 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Commented] (YARN-9140) Code cleanup in ResourcePluginManager.initialize and in TestResourcePluginManager

2019-04-11 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815352#comment-16815352
 ] 

Adam Antal commented on YARN-9140:
--

Thanks for the patch [~snemeth], +1 (non-binding).

> Code cleanup in ResourcePluginManager.initialize and in 
> TestResourcePluginManager
> -
>
> Key: YARN-9140
> URL: https://issues.apache.org/jira/browse/YARN-9140
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Trivial
> Attachments: YARN-9140.001.patch, YARN-9140.002.patch
>
>
> Some code cleanup is needed in ResourcePluginManager#initialize: 
>  * There's a big code block that initializes resource plugins, this should be 
> extracted to a separate method.
>  * Exception handling could be simplified.
> TestResourcePluginManager minor cleanup: 
>  * Not thrown exceptions could be deleted from method signatures
>  * verify(obj, times(1)).() calls: times(1) parameter could be 
> deleted as it is the default if verify(obj) is invoked without the times 
> parameter.
>  * Some code exceeds the 80 character column limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9435) Add Opportunistic Scheduler metrics in ResourceManager.

2019-04-11 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815335#comment-16815335
 ] 

Abhishek Modi commented on YARN-9435:
-

None of the findbugs warnings is related to the patch.

> Add Opportunistic Scheduler metrics in ResourceManager.
> ---
>
> Key: YARN-9435
> URL: https://issues.apache.org/jira/browse/YARN-9435
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9435.001.patch, YARN-9435.002.patch, 
> YARN-9435.003.patch, YARN-9435.004.patch
>
>
> Right now there are no metrics available for Opportunistic Scheduler at 
> ResourceManager. As part of this jira, we will add metrics like number of 
> allocated opportunistic containers, released opportunistic containers, node 
> level allocations, rack level allocations etc. for Opportunistic Scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9052) Replace all MockRM submit method definitions with a builder

2019-04-11 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815322#comment-16815322
 ] 

Szilard Nemeth commented on YARN-9052:
--

Hi [~adam.antal]!

Thanks for the review!
Yep, I agree that this is a massive one, also think that the rebase could be a 
massive one.
I would like to get this in as I think it's a useful patch to have, but I can't 
guarantee I'm going to have some time in the near future (coming weeks).
After I finished some ongoing Submarine development, I can work on this :) 

> Replace all MockRM submit method definitions with a builder
> ---
>
> Key: YARN-9052
> URL: https://issues.apache.org/jira/browse/YARN-9052
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-9052.001.patch, YARN-9052.002.patch
>
>
> MockRM has 31 definitions of submitApp, most of them having more than 
> acceptable number of parameters, ranging from 2 to even 22 parameters, which 
> makes the code completely unreadable.
> On top of unreadability, it's very hard to follow what RmApp will be produced 
> for tests as they often pass a lot of empty / null values as parameters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9052) Replace all MockRM submit method definitions with a builder

2019-04-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815321#comment-16815321
 ] 

Hadoop QA commented on YARN-9052:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 10s{color} 
| {color:red} YARN-9052 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-9052 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12949432/YARN-9052.002.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23935/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Replace all MockRM submit method definitions with a builder
> ---
>
> Key: YARN-9052
> URL: https://issues.apache.org/jira/browse/YARN-9052
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-9052.001.patch, YARN-9052.002.patch
>
>
> MockRM has 31 definitions of submitApp, most of them having more than 
> acceptable number of parameters, ranging from 2 to even 22 parameters, which 
> makes the code completely unreadable.
> On top of unreadability, it's very hard to follow what RmApp will be produced 
> for tests as they often pass a lot of empty / null values as parameters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9462) TestResourceTrackerService.testNodeRemovalGracefully fails sporadically

2019-04-11 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9462:

Attachment: (was: YARN-9462-001.patch)

> TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
> ---
>
> Key: YARN-9462
> URL: https://issues.apache.org/jira/browse/YARN-9462
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, test
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: 
> TestResourceTrackerService.testNodeRemovalGracefully.txt, YARN-9462-001.patch
>
>
> TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
> {code}
> [ERROR] 
> testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
>   Time elapsed: 3.385 s  <<< FAILURE!
> java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but 
> was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtilDecomToUntracked(TestResourceTrackerService.java:2318)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtil(TestResourceTrackerService.java:2280)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalGracefully(TestResourceTrackerService.java:2133)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9462) TestResourceTrackerService.testNodeRemovalGracefully fails sporadically

2019-04-11 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9462:

Attachment: YARN-9462-001.patch

> TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
> ---
>
> Key: YARN-9462
> URL: https://issues.apache.org/jira/browse/YARN-9462
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, test
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: 
> TestResourceTrackerService.testNodeRemovalGracefully.txt, YARN-9462-001.patch
>
>
> TestResourceTrackerService.testNodeRemovalGracefully fails sporadically
> {code}
> [ERROR] 
> testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
>   Time elapsed: 3.385 s  <<< FAILURE!
> java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but 
> was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtilDecomToUntracked(TestResourceTrackerService.java:2318)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalUtil(TestResourceTrackerService.java:2280)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testNodeRemovalGracefully(TestResourceTrackerService.java:2133)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9052) Replace all MockRM submit method definitions with a builder

2019-04-11 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815311#comment-16815311
 ] 

Adam Antal commented on YARN-9052:
--

Thanks for the patch [~snemeth], massive one! I wanted to have some more 
meaningful comments, but I can't effectively analyse your patch without an IDE, 
and it looks like your patch does not apply to current trunk. This is not 
suprising as you have touched a lot of files.

- Please rebase (I am sorry)
- The patch still has 119 new checkstyle issues.
- In {{testValidateRequestCapacityAgainstMinMaxAllocation}}, 
{{testRequestCapacityMinMaxAllocationForResourceTypes}} no need to create a 
separate {{MockRMAppSubmissionData}} object (you can just create right like you 
did other time the {{MockRM}} creation by the {{MockRMAppSubmitter}})
- In {{security/TestDelegationTokenRenewer.java}} you do not need the 
{{Resource}} object - the call with the memory automatically does what you 
want. Also: I could not read your whole patch, but do you actually need the 
{{createWithDefaults}} with the {{Resource}} object? I see all the time using 
only the memory.
Speaking about *Builders*: I think there should be only one 
{{createWithDefaults}} function, which accepts only one parameter: a 
{{MockRM}}. The other two type of the function (called with memory or resource) 
can be added like the other parameters: creating a {{withMemory}} and a 
{{withResource}} function that provides these parameters to the builder. If it 
is required by logic of the class that one of those must be given, then this 
condition should be checked in the build() function. 

> Replace all MockRM submit method definitions with a builder
> ---
>
> Key: YARN-9052
> URL: https://issues.apache.org/jira/browse/YARN-9052
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-9052.001.patch, YARN-9052.002.patch
>
>
> MockRM has 31 definitions of submitApp, most of them having more than 
> acceptable number of parameters, ranging from 2 to even 22 parameters, which 
> makes the code completely unreadable.
> On top of unreadability, it's very hard to follow what RmApp will be produced 
> for tests as they often pass a lot of empty / null values as parameters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9430) Recovering containers does not check available resources on node

2019-04-11 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815280#comment-16815280
 ] 

Szilard Nemeth commented on YARN-9430:
--

As per our further offline discussion with [~adam.antal], our approach could be 
the following, so the TODOs left here: 
1. If the NM is starting up with less resources than it allocated for 
containers before recovery, we need to fail-fast. 
A runtime exception should be thrown the first time we realize we have less 
resources than the sum of resources would be allocated for containers. 
This situation can be realized either if the amount of vcores / memory 
(yarn.nodemanager.resource.cpu-vcores / yarn.nodemanager.resource.memory-mb) or 
some GPU device goes offline, so resources are less than the would-be allocated 
resources.
2. If we think an exception is too heavy as a solution, we should have the 
error log as a last resort.

[~adam.antal]: Anything to add? 
Thanks!

> Recovering containers does not check available resources on node
> 
>
> Key: YARN-9430
> URL: https://issues.apache.org/jira/browse/YARN-9430
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Priority: Critical
>
> I have a testcase that checks if some GPU devices gone offline and recovery 
> happens, only the containers that fit into the node's resources will be 
> recovered. Unfortunately, this is not the case: RM does not check available 
> resources on node during recovery.
> *Detailed explanation:*
> *Testcase:* 
>  1. There are 2 nodes running NodeManagers
>  2. nvidia-smi is replaced with a fake bash script that reports 2 GPU devices 
> per node, initially. This means 4 GPU devices in the cluster altogether.
>  3. RM / NM recovery is enabled
>  4. The test starts off a sleep job, requesting 4 containers, 1 GPU device 
> for each (AM does not request GPUs)
>  5. Before restart, the fake bash script is adjusted to report 1 GPU device 
> per node (2 in the cluster) after restart.
>  6. Restart is initiated.
>  
> *Expected behavior:* 
>  After restart, only the AM and 2 normal containers should have been started, 
> as there are only 2 GPU devices in the cluster.
>  
> *Actual behaviour:* 
>  AM + 4 containers are allocated, this is all containers started originally 
> with step 4.
> App id was: 1553977186701_0001
> *Logs*:
>  
> {code:java}
> 2019-03-30 13:22:30,299 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Processing event for appattempt_1553977186701_0001_01 of type RECOVER
> 2019-03-30 13:22:30,366 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Added Application Attempt appattempt_1553977186701_0001_01 to scheduler 
> from user: systest
>  2019-03-30 13:22:30,366 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> appattempt_1553977186701_0001_01 is recovering. Skipping notifying 
> ATTEMPT_ADDED
>  2019-03-30 13:22:30,367 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1553977186701_0001_01 State change from NEW to LAUNCHED on 
> event = RECOVER
> 2019-03-30 13:22:33,257 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
>  Recovering container [container_e84_1553977186701_0001_01_01, 
> CreateTime: 1553977260732, Version: 0, State: RUNNING, Capability: 
> , Diagnostics: , ExitStatus: -1000, 
> NodeLabelExpression: Priority: 0]
> 2019-03-30 13:22:33,275 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
>  Recovering container [container_e84_1553977186701_0001_01_04, 
> CreateTime: 1553977272802, Version: 0, State: RUNNING, Capability: 
> , Diagnostics: , ExitStatus: -1000, 
> NodeLabelExpression: Priority: 0]
> 2019-03-30 13:22:33,275 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode: 
> Assigned container container_e84_1553977186701_0001_01_04 of capacity 
>  on host 
> snemeth-gpu-2.vpc.cloudera.com:8041, which has 2 containers,  vCores:2, yarn.io/gpu: 1> used and  available after 
> allocation
> 2019-03-30 13:22:33,276 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
>  Recovering container [container_e84_1553977186701_0001_01_05, 
> CreateTime: 1553977272803, Version: 0, State: RUNNING, Capability: 
> , Diagnostics: , ExitStatus: -1000, 
> NodeLabelExpression: Priority: 0]
>  2019-03-30 13:22:33,276 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Processing container_e84_1553977186701_0001_01_05 of type RECOVER
>  2019-03-30 13:22:33,276 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> 

[jira] [Commented] (YARN-7721) TestContinuousScheduling fails sporadically with NPE

2019-04-11 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815263#comment-16815263
 ] 

Adam Antal commented on YARN-7721:
--

{{SchedulerApplicationAttempt#getLastScheduledContainer}} is also used in 
{{SchedulerApplicationAttempt#transferStateFromPreviousAttempt}}. Can it happen 
that this issue appears there as well? I mean, if the test needs some time 
waiting, do we have to wait there as well?

> TestContinuousScheduling fails sporadically with NPE
> 
>
> Key: YARN-7721
> URL: https://issues.apache.org/jira/browse/YARN-7721
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Jason Lowe
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-7721.001.patch
>
>
> TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime is 
> failing sporadically with an NPE in precommit builds, and I can usually 
> reproduce it locally after a few tries:
> {noformat}
> [ERROR] 
> testFairSchedulerContinuousSchedulingInitTime(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling)
>   Time elapsed: 0.085 s  <<< ERROR!
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testFairSchedulerContinuousSchedulingInitTime(TestContinuousScheduling.java:383)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> [...]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9435) Add Opportunistic Scheduler metrics in ResourceManager.

2019-04-11 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815254#comment-16815254
 ] 

Hadoop QA commented on YARN-9435:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
15s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 in trunk has 2 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 52s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
46s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 81m  
5s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}144m 37s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9435 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12965554/YARN-9435.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux b59faeb92442 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 586826f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| findbugs | 

[jira] [Commented] (YARN-9472) Add multi-thread asynchronous scheduling to fair scheduler

2019-04-11 Thread zhuqi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815238#comment-16815238
 ] 

zhuqi commented on YARN-9472:
-

[~leftnoteasy]  [~Tao Yang] 

If any suggestions.

Thanks.

> Add multi-thread asynchronous scheduling to fair scheduler
> --
>
> Key: YARN-9472
> URL: https://issues.apache.org/jira/browse/YARN-9472
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler, resourcemanager
>Reporter: zhuqi
>Priority: Major
>
> Now the capacity scheduler has multi-thread asynchronous scheduling, i think 
> the fair scheduler also need to support it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7319) java.net.UnknownHostException when trying contact node by hostname

2019-04-11 Thread Liu Shaohui (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815232#comment-16815232
 ] 

Liu Shaohui commented on YARN-7319:
---

+1 for encountering the same problem when deploying hadoop on k8s for elastic 
scaling.

I think we should add a option to make hadoop running on env without dns

> java.net.UnknownHostException when trying contact node by hostname
> --
>
> Key: YARN-7319
> URL: https://issues.apache.org/jira/browse/YARN-7319
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Evgeny Makarov
>Priority: Major
>
> I'm trying to setup Hadoop on Kubernetes cluster with following setup:
> Hadoop master is k8s pod
> Each hadoop slave is additional k8s pod
> All communication is being processed on IP based manned. In HDFS I have 
> setting of dfs.namenode.datanode.registration.ip-hostname-check set to false 
> and all works fine, however same option missing for YARN manager. 
> Here part of hadoop-master log when trying to submit simple word-count job:
> 2017-10-12 09:00:25,005 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  Error trying to assign container token and NM token to an allocated 
> container container_1507798393049_0001_01_01
> java.lang.IllegalArgumentException: java.net.UnknownHostException: 
> hadoop-slave-743067341-hqrbk
> at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
> at 
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:258)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:220)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.pullNewlyAllocatedContainersAndNMTokens(SchedulerApplicationAttempt.java:454)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.getAllocation(FiCaSchedulerApp.java:269)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocate(CapacityScheduler.java:988)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:971)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:964)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:789)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:795)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:776)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.UnknownHostException: hadoop-slave-743067341-hqrbk
> ... 19 more
> As can be seen, host hadoop-slave-743067341-hqrbk is unreachable. Adding 
> record to /ets/hosts of master will solve the problem, however its not an 
> option in Kubernetes environment. There is should be a way to resolve nodes 
> by IP address



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9472) Add multi-thread asynchronous scheduling to fair scheduler

2019-04-11 Thread zhuqi (JIRA)
zhuqi created YARN-9472:
---

 Summary: Add multi-thread asynchronous scheduling to fair scheduler
 Key: YARN-9472
 URL: https://issues.apache.org/jira/browse/YARN-9472
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler, resourcemanager
Reporter: zhuqi


Now the capacity scheduler has multi-thread asynchronous scheduling, i think 
the fair scheduler also need to support it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9435) Add Opportunistic Scheduler metrics in ResourceManager.

2019-04-11 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815133#comment-16815133
 ] 

Abhishek Modi commented on YARN-9435:
-

Thanks [~giovanni.fumarola] for reviewing this. I have attached v4 patch after 
removing sleep. 

> Add Opportunistic Scheduler metrics in ResourceManager.
> ---
>
> Key: YARN-9435
> URL: https://issues.apache.org/jira/browse/YARN-9435
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9435.001.patch, YARN-9435.002.patch, 
> YARN-9435.003.patch, YARN-9435.004.patch
>
>
> Right now there are no metrics available for Opportunistic Scheduler at 
> ResourceManager. As part of this jira, we will add metrics like number of 
> allocated opportunistic containers, released opportunistic containers, node 
> level allocations, rack level allocations etc. for Opportunistic Scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9435) Add Opportunistic Scheduler metrics in ResourceManager.

2019-04-11 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-9435:

Attachment: YARN-9435.004.patch

> Add Opportunistic Scheduler metrics in ResourceManager.
> ---
>
> Key: YARN-9435
> URL: https://issues.apache.org/jira/browse/YARN-9435
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9435.001.patch, YARN-9435.002.patch, 
> YARN-9435.003.patch, YARN-9435.004.patch
>
>
> Right now there are no metrics available for Opportunistic Scheduler at 
> ResourceManager. As part of this jira, we will add metrics like number of 
> allocated opportunistic containers, released opportunistic containers, node 
> level allocations, rack level allocations etc. for Opportunistic Scheduler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org