[jira] [Commented] (YARN-8670) Support scheduling request for SLS input and attributes for Node in SLS

2018-08-23 Thread Sichen zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591187#comment-16591187
 ] 

Sichen zhao commented on YARN-8670:
---

Hi,

My idea of a large number of jobs/requests with PCs is the scheduling request 
with json format, just like the SLS traces. Using json2pb to transform 
schedulingrequest to rpc pb.

> Support scheduling request for SLS input and attributes for Node in SLS
> ---
>
> Key: YARN-8670
> URL: https://issues.apache.org/jira/browse/YARN-8670
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler-load-simulator
>Affects Versions: YARN-3409
>Reporter: Sichen zhao
>Priority: Major
> Fix For: YARN-3409
>
>
> YARN-3409 introduces placement constraint, Currently SLS does not support 
> specify placement constraint. 
> YARN-8007 support specifying placement constraint for task containers in SLS. 
> But there are still 
> some room for improvement:
>  # YARN-8007 only support placement constraint for the jobs level. In fact, 
> the more flexible way is support placement constraint for the tasks level.
>  # In most scenarios, node itself has some characteristics, called attribute, 
> which is not supported in SLS. So we can add the attribute on Nodes.
>  # YARN-8007 use the SYNTH as schedulingrequest input. But SYNTH can't create 
> a large number of specific resource requests. We wanna create a new 
> schedulingrequest input format(like sis format) for the more authentic input. 
> We can add some field in sis format, and this is schedulingrequest input 
> format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8707) It's not reasonable to decide whether app is starved by fairShare

2018-08-23 Thread Zhaohui Xin (JIRA)
Zhaohui Xin created YARN-8707:
-

 Summary: It's not reasonable to decide whether app is starved by 
fairShare
 Key: YARN-8707
 URL: https://issues.apache.org/jira/browse/YARN-8707
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 3.0.0-alpha3
Reporter: Zhaohui Xin
Assignee: Zhaohui Xin


When app's usage reached demand, it's still be considered fairShare starved. 
Obviously, that's not reasonable!
{code:java}
boolean isStarvedForFairShare() {
return isUsageBelowShare(getResourceUsage(), getFairShare());
}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7773) YARN Federation used Mysql as state store throw exception, Unknown column 'homeSubCluster' in 'field list'

2018-08-23 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591138#comment-16591138
 ] 

Bibin A Chundatt commented on YARN-7773:


Backported to branch-3.1 too

> YARN Federation used Mysql as state store throw exception, Unknown column 
> 'homeSubCluster' in 'field list'
> --
>
> Key: YARN-7773
> URL: https://issues.apache.org/jira/browse/YARN-7773
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 2.9.0, 3.0.0-alpha1, 3.0.0-alpha2, 3.0.0-beta1, 
> 3.0.0-alpha4, 3.0.0-alpha3, 3.0.0
> Environment: Hadoop 3.0.0
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Blocker
>  Labels: patch
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-7773.001.patch
>
>
> An error occurred when YARN Federation used Mysql as state store. The reason 
> I found it was because the field used to create the 
> applicationsHomeSubCluster table was 'subClusterId' and the stored procedure 
> used 'homeSubCluster'. I fixed this problem.
>  
> submitApplication appIdapplication_1516277664083_0014 try #0 on SubCluster 
> cluster1 , queue: root.bdp_federation
>  [2018-01-18T23:25:29.325+08:00] [ERROR] 
> store.impl.SQLFederationStateStore.logAndThrowRetriableException(FederationStateStoreUtils.java
>  158) [IPC Server handler 44 on 8050] : Unable to insert the newly generated 
> application application_1516277664083_0014
>  com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
> 'homeSubCluster' in 'field list'
>  at sun.reflect.GeneratedConstructorAccessor15.newInstance(Unknown Source)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
>  at com.mysql.jdbc.Util.getInstance(Util.java:408)
>  at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:944)
>  at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
>  at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
>  at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
>  at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
>  at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
>  at 
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
>  at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
>  at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
>  at 
> com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
>  at 
> com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418)
>  at com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887)
>  at 
> com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
>  at 
> com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java)
>  at 
> org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>  at com.sun.proxy.$Proxy31.addApplicationHomeSubCluster(Unknown Source)
>  at 
> org.apache.hadoop.yarn.server.federation.utils.FederationStateStoreFacade.addApplicationHomeSubCluster(FederationStateStoreFacade.java:345)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.JDFederationClientInterceptor.submitApplication(JDFederationClientInterceptor.java:334)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.submitApplication(RouterClientRMService.java:196)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:218)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:419)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>  at org.apache.hadoop.ipc.Server$Handl

[jira] [Commented] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job

2018-08-23 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591120#comment-16591120
 ] 

Zhankun Tang commented on YARN-8698:


[~yuan_zac] Yeah. Thanks for clarification. Actually I’ve tried in my 
environment yesterday as you posted. A wrong HADOOP_COMMON_HOME will cause 
problem no matter what HADOOP_HDFS_HOME is.

However, I found if HADOOP_COMMON_HOME is not set, a correct HADOOP_HDFS_HOME 
will also works. So that's why I ask if you specify it.

Since now we've specified correct HADOOP_HDFS_HOME, and submarine doesn't set 
HADOOP_COMMON_HOME if I remember correctly, who set the wrong 
HADOOP_COMMON_HOME? I'm not sure at present but current patch seems a 
workaround without knowing the root cause?


> [Submarine] Failed to add hadoop dependencies in docker container when 
> submitting a submarine job
> -
>
> Key: YARN-8698
> URL: https://issues.apache.org/jira/browse/YARN-8698
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Assignee: Zac Zhou
>Priority: Major
> Attachments: YARN-8698.001.patch
>
>
> When a standalone submarine tf job is submitted, the following error is got :
> INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11)
>  INFO:tensorflow:Done calling model_fn.
>  INFO:tensorflow:Create CheckpointSaverHook.
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  
> This error may be related to hadoop classpath
> Hadoop env variables of launch_container.sh are as follows:
> export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"}
>  export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"}
>  
> run-PRIMARY_WORKER.sh is like:
> export HADOOP_YARN_HOME=
>  export HADOOP_HDFS_HOME=/hadoop-3.1.0
>  export HADOOP_CONF_DIR=$WORK_DIR
>  
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job

2018-08-23 Thread Zac Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591040#comment-16591040
 ] 

Zac Zhou edited comment on YARN-8698 at 8/24/18 1:55 AM:
-

Hi [~tangzhankun],

yeah, I specified "DOCKER_HADOOP_HDFS_HOME" to /hadoop-3.1.0 which is the 
hadoop home directory in docker image.

In docker, DOCKER_HADOOP_HDFS_HOME takes effect,  but it is not enough.

I think you can test it even without docker env.

 when you just specify HADOOP_HDFS_HOME, it works well as follows:
{code:java}
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export 
HADOOP_HDFS_HOME=/home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ echo $HADOOP_HDFS_HOME
/home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop 
classpath --glob
HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false
HADOOP_SUBCMD_SECURESERVICE: false
HADOOP_DAEMON_MODE: default
/home/hadoop/yarn-submarine/etc/hadoop:/home/hadoop/yarn-submarine/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/commons-lang3-3.7.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/gson-2.2.4.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/paranamer-2.3.jar:
{code}
But, if a wrong  HADOOP_COMMON_HOME is specified with a correct 
HADOOP_HDFS_HOME, it will fail.
{code:java}
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export 
HADOOP_COMMON_HOME=/home/hadoop
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop 
classpath --glob
HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false
HADOOP_SUBCMD_SECURESERVICE: false
HADOOP_DAEMON_MODE: default
Error: Could not find or load main class org.apache.hadoop.util.Classpath
{code}
 

 


was (Author: yuan_zac):
Hi [~tangzhankun],

yeah, I specified "DOCKER_HADOOP_HDFS_HOME" to /hadoop-3.1.0 which is the 
hadoop home directory in docker image.

In docker, DOCKER_HADOOP_HDFS_HOME takes effect,  but it is not enough.

I think you can test it even without docker env.

 when you just specify HADOOP_HDFS_HOME, it works well as follows:
{code:java}
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export 
HADOOP_HDFS_HOME=/home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ echo $HADOOP_HDFS_HOME
/home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop 
classpath --glob
HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false
HADOOP_SUBCMD_SECURESERVICE: false
HADOOP_DAEMON_MODE: default
/home/hadoop/yarn-submarine/etc/hadoop:/home/hadoop/yarn-submarine/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/commons-lang3-3.7.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/gson-2.2.4.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/paranamer-2.3.jar:
{code}
But, if you specify a wrong  HADOOP_COMMON_HOME with a correct 
HADOOP_HDFS_HOME, it will fail.
{code:java}
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export 
HADOOP_COMMON_HOME=/home/hadoop
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop 
classpath --glob
HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false
HADOOP_SUBCMD_SECURESERVICE: false
HADOOP_DAEMON_MODE: default
Error: Could not find or load main class org.apache.hadoop.util.Classpath
{code}
 

 

> [Submarine] Failed to add hadoop dependencies in docker container when 
> submitting a submarine job
> -
>
> Key: YARN-8698
> URL: https://issues.apache.org/jira/browse/YARN-8698
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Assignee: Zac Zhou
>Priority: Major
> Attachments: YARN-8698.001.patch
>
>
> When a standalone submarine tf job is submitted, the following error is got :
> INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11)
>  INFO:tensorflow:Done calling model_fn.
>  INFO:tensorflow:Create CheckpointSaverHook.
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  
> This error may be related to hadoop classpath
> Hadoop env variables of launch_container.sh are as follows:
> export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yar

[jira] [Closed] (YARN-8704) Improve the error message for an invalid docker rw mount to be more informative

2018-08-23 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang closed YARN-8704.
-

> Improve the error message for an invalid docker rw mount to be more 
> informative
> ---
>
> Key: YARN-8704
> URL: https://issues.apache.org/jira/browse/YARN-8704
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Weiwei Yang
>Priority: Minor
>
> Seeing following error message while starting a privileged docker container
> {noformat}
> Error constructing docker command, docker error code=14, error 
> message='Invalid docker read-write mount'
> {noformat}
> it would be good if it tells us which mount is invalid and how to fix it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8704) Improve the error message for an invalid docker rw mount to be more informative

2018-08-23 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved YARN-8704.
---
Resolution: Invalid

> Improve the error message for an invalid docker rw mount to be more 
> informative
> ---
>
> Key: YARN-8704
> URL: https://issues.apache.org/jira/browse/YARN-8704
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Weiwei Yang
>Priority: Minor
>
> Seeing following error message while starting a privileged docker container
> {noformat}
> Error constructing docker command, docker error code=14, error 
> message='Invalid docker read-write mount'
> {noformat}
> it would be good if it tells us which mount is invalid and how to fix it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8704) Improve the error message for an invalid docker rw mount to be more informative

2018-08-23 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591039#comment-16591039
 ] 

Weiwei Yang commented on YARN-8704:
---

Hi [~eyang], [~shaneku...@gmail.com]

Thanks for the response. You are correct, I missed that message in the log

{noformat}

Shell error output: Invalid docker rw mount 
'/home/wwei/hadoop-data/yarn/log/application_1535074859217_0001/container_1535074859217_0001_01_02:/home/wwei/hadoop-data/yarn/log/application_1535074859217_0001/container_1535074859217_0001_01_02:rw',
 
realpath=/home/wwei/hadoop-data/yarn/log/application_1535074859217_0001/container_1535074859217_0001_01_02
Error constructing docker command, docker error code=14, error message='Invalid 
docker read-write mount'

{noformat}

I does display the invalid mount. Closing this as "Not a problem".

Thanks for your time!

> Improve the error message for an invalid docker rw mount to be more 
> informative
> ---
>
> Key: YARN-8704
> URL: https://issues.apache.org/jira/browse/YARN-8704
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Weiwei Yang
>Priority: Minor
>
> Seeing following error message while starting a privileged docker container
> {noformat}
> Error constructing docker command, docker error code=14, error 
> message='Invalid docker read-write mount'
> {noformat}
> it would be good if it tells us which mount is invalid and how to fix it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job

2018-08-23 Thread Zac Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591040#comment-16591040
 ] 

Zac Zhou commented on YARN-8698:


Hi [~tangzhankun],

yeah, I specified "DOCKER_HADOOP_HDFS_HOME" to /hadoop-3.1.0 which is the 
hadoop home directory in docker image.

In docker, DOCKER_HADOOP_HDFS_HOME takes effect,  but it is not enough.

I think you can test it even without docker env.

 when you just specify HADOOP_HDFS_HOME, it works well as follows:
{code:java}
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export 
HADOOP_HDFS_HOME=/home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ echo $HADOOP_HDFS_HOME
/home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop 
classpath --glob
HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false
HADOOP_SUBCMD_SECURESERVICE: false
HADOOP_DAEMON_MODE: default
/home/hadoop/yarn-submarine/etc/hadoop:/home/hadoop/yarn-submarine/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/commons-lang3-3.7.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/gson-2.2.4.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/paranamer-2.3.jar:
{code}
But, if you specify a wrong  HADOOP_COMMON_HOME with a correct 
HADOOP_HDFS_HOME, it failed.
{code:java}
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export 
HADOOP_COMMON_HOME=/home/hadoop
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop 
classpath --glob
HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false
HADOOP_SUBCMD_SECURESERVICE: false
HADOOP_DAEMON_MODE: default
Error: Could not find or load main class org.apache.hadoop.util.Classpath
{code}
 

 

> [Submarine] Failed to add hadoop dependencies in docker container when 
> submitting a submarine job
> -
>
> Key: YARN-8698
> URL: https://issues.apache.org/jira/browse/YARN-8698
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Assignee: Zac Zhou
>Priority: Major
> Attachments: YARN-8698.001.patch
>
>
> When a standalone submarine tf job is submitted, the following error is got :
> INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11)
>  INFO:tensorflow:Done calling model_fn.
>  INFO:tensorflow:Create CheckpointSaverHook.
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  
> This error may be related to hadoop classpath
> Hadoop env variables of launch_container.sh are as follows:
> export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"}
>  export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"}
>  
> run-PRIMARY_WORKER.sh is like:
> export HADOOP_YARN_HOME=
>  export HADOOP_HDFS_HOME=/hadoop-3.1.0
>  export HADOOP_CONF_DIR=$WORK_DIR
>  
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job

2018-08-23 Thread Zac Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591040#comment-16591040
 ] 

Zac Zhou edited comment on YARN-8698 at 8/24/18 1:47 AM:
-

Hi [~tangzhankun],

yeah, I specified "DOCKER_HADOOP_HDFS_HOME" to /hadoop-3.1.0 which is the 
hadoop home directory in docker image.

In docker, DOCKER_HADOOP_HDFS_HOME takes effect,  but it is not enough.

I think you can test it even without docker env.

 when you just specify HADOOP_HDFS_HOME, it works well as follows:
{code:java}
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export 
HADOOP_HDFS_HOME=/home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ echo $HADOOP_HDFS_HOME
/home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop 
classpath --glob
HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false
HADOOP_SUBCMD_SECURESERVICE: false
HADOOP_DAEMON_MODE: default
/home/hadoop/yarn-submarine/etc/hadoop:/home/hadoop/yarn-submarine/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/commons-lang3-3.7.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/gson-2.2.4.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/paranamer-2.3.jar:
{code}
But, if you specify a wrong  HADOOP_COMMON_HOME with a correct 
HADOOP_HDFS_HOME, it will fail.
{code:java}
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export 
HADOOP_COMMON_HOME=/home/hadoop
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop 
classpath --glob
HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false
HADOOP_SUBCMD_SECURESERVICE: false
HADOOP_DAEMON_MODE: default
Error: Could not find or load main class org.apache.hadoop.util.Classpath
{code}
 

 


was (Author: yuan_zac):
Hi [~tangzhankun],

yeah, I specified "DOCKER_HADOOP_HDFS_HOME" to /hadoop-3.1.0 which is the 
hadoop home directory in docker image.

In docker, DOCKER_HADOOP_HDFS_HOME takes effect,  but it is not enough.

I think you can test it even without docker env.

 when you just specify HADOOP_HDFS_HOME, it works well as follows:
{code:java}
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export 
HADOOP_HDFS_HOME=/home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ echo $HADOOP_HDFS_HOME
/home/hadoop/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop 
classpath --glob
HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false
HADOOP_SUBCMD_SECURESERVICE: false
HADOOP_DAEMON_MODE: default
/home/hadoop/yarn-submarine/etc/hadoop:/home/hadoop/yarn-submarine/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/commons-lang3-3.7.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/gson-2.2.4.jar:/home/hadoop/yarn-submarine/share/hadoop/common/lib/paranamer-2.3.jar:
{code}
But, if you specify a wrong  HADOOP_COMMON_HOME with a correct 
HADOOP_HDFS_HOME, it failed.
{code:java}
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ export 
HADOOP_COMMON_HOME=/home/hadoop
hadoop@hostname:~/zq/submarine-lib/hadoop-3.2.0-SNAPSHOT$ ./bin/hadoop 
classpath --glob
HADOOP_SUBCMD_SUPPORTDAEMONIZATION: false
HADOOP_SUBCMD_SECURESERVICE: false
HADOOP_DAEMON_MODE: default
Error: Could not find or load main class org.apache.hadoop.util.Classpath
{code}
 

 

> [Submarine] Failed to add hadoop dependencies in docker container when 
> submitting a submarine job
> -
>
> Key: YARN-8698
> URL: https://issues.apache.org/jira/browse/YARN-8698
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Assignee: Zac Zhou
>Priority: Major
> Attachments: YARN-8698.001.patch
>
>
> When a standalone submarine tf job is submitted, the following error is got :
> INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11)
>  INFO:tensorflow:Done calling model_fn.
>  INFO:tensorflow:Create CheckpointSaverHook.
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  
> This error may be related to hadoop classpath
> Hadoop env variables of launch_container.sh are as follows:
> export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-su

[jira] [Updated] (YARN-7773) YARN Federation used Mysql as state store throw exception, Unknown column 'homeSubCluster' in 'field list'

2018-08-23 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-7773:
---
Fix Version/s: 3.1.2

> YARN Federation used Mysql as state store throw exception, Unknown column 
> 'homeSubCluster' in 'field list'
> --
>
> Key: YARN-7773
> URL: https://issues.apache.org/jira/browse/YARN-7773
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 2.9.0, 3.0.0-alpha1, 3.0.0-alpha2, 3.0.0-beta1, 
> 3.0.0-alpha4, 3.0.0-alpha3, 3.0.0
> Environment: Hadoop 3.0.0
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Blocker
>  Labels: patch
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-7773.001.patch
>
>
> An error occurred when YARN Federation used Mysql as state store. The reason 
> I found it was because the field used to create the 
> applicationsHomeSubCluster table was 'subClusterId' and the stored procedure 
> used 'homeSubCluster'. I fixed this problem.
>  
> submitApplication appIdapplication_1516277664083_0014 try #0 on SubCluster 
> cluster1 , queue: root.bdp_federation
>  [2018-01-18T23:25:29.325+08:00] [ERROR] 
> store.impl.SQLFederationStateStore.logAndThrowRetriableException(FederationStateStoreUtils.java
>  158) [IPC Server handler 44 on 8050] : Unable to insert the newly generated 
> application application_1516277664083_0014
>  com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
> 'homeSubCluster' in 'field list'
>  at sun.reflect.GeneratedConstructorAccessor15.newInstance(Unknown Source)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
>  at com.mysql.jdbc.Util.getInstance(Util.java:408)
>  at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:944)
>  at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
>  at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
>  at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
>  at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
>  at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
>  at 
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
>  at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
>  at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
>  at 
> com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
>  at 
> com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418)
>  at com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887)
>  at 
> com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
>  at 
> com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java)
>  at 
> org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>  at com.sun.proxy.$Proxy31.addApplicationHomeSubCluster(Unknown Source)
>  at 
> org.apache.hadoop.yarn.server.federation.utils.FederationStateStoreFacade.addApplicationHomeSubCluster(FederationStateStoreFacade.java:345)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.JDFederationClientInterceptor.submitApplication(JDFederationClientInterceptor.java:334)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.submitApplication(RouterClientRMService.java:196)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:218)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:419)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2076)
>  at org.apache.hadoop.ipc.Server$

[jira] [Commented] (YARN-8488) YARN service/components/instances should have SUCCEEDED/FAILED states

2018-08-23 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591033#comment-16591033
 ] 

genericqa commented on YARN-8488:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  1s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core:
 The patch generated 12 new + 51 unchanged - 3 fixed = 63 total (was 54) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 57s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 
21s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 65m  7s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8488 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12936922/YARN-8488.6.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ccdb64f5d027 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ca29fb7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/21672/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/21672/artifact/out/whitespace-eol.txt
 |
|  Test Results |

[jira] [Commented] (YARN-8705) Refactor in preparation for YARN-8696

2018-08-23 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590992#comment-16590992
 ] 

genericqa commented on YARN-8705:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
54s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 23s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 57s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
38s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
35s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 88m 53s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8705 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12936911/YARN-8705.v2.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f32c4a6af1aa 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ca29fb7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| 

[jira] [Commented] (YARN-8488) YARN service/components/instances should have SUCCEEDED/FAILED states

2018-08-23 Thread Suma Shivaprasad (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590973#comment-16590973
 ] 

Suma Shivaprasad commented on YARN-8488:


Thanks [~eyang] Attached patch with review comments fixed.

> YARN service/components/instances should have SUCCEEDED/FAILED states
> -
>
> Key: YARN-8488
> URL: https://issues.apache.org/jira/browse/YARN-8488
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8488.1.patch, YARN-8488.2.patch, YARN-8488.3.patch, 
> YARN-8488.4.patch, YARN-8488.5.patch, YARN-8488.6.patch
>
>
> Existing YARN service has following states:
> {code} 
> public enum ServiceState {
>   ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING,
>   UPGRADING_AUTO_FINALIZE;
> }
> {code} 
> Ideally we should add "SUCCEEDED" state in order to support long running 
> applications like Tensorflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8488) YARN service/components/instances should have SUCCEEDED/FAILED states

2018-08-23 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8488:
---
Attachment: YARN-8488.6.patch

> YARN service/components/instances should have SUCCEEDED/FAILED states
> -
>
> Key: YARN-8488
> URL: https://issues.apache.org/jira/browse/YARN-8488
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8488.1.patch, YARN-8488.2.patch, YARN-8488.3.patch, 
> YARN-8488.4.patch, YARN-8488.5.patch, YARN-8488.6.patch
>
>
> Existing YARN service has following states:
> {code} 
> public enum ServiceState {
>   ACCEPTED, STARTED, STABLE, STOPPED, FAILED, FLEX, UPGRADING,
>   UPGRADING_AUTO_FINALIZE;
> }
> {code} 
> Ideally we should add "SUCCEEDED" state in order to support long running 
> applications like Tensorflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8705) Refactor in preparation for YARN-8696

2018-08-23 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8705:
---
Attachment: YARN-8705.v2.patch

> Refactor in preparation for YARN-8696
> -
>
> Key: YARN-8705
> URL: https://issues.apache.org/jira/browse/YARN-8705
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8705.v1.patch, YARN-8705.v2.patch
>
>
> Refactor the UAM heartbeat thread as well as call back method in preparation 
> for YARN-8696 FederationInterceptor upgrade



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8705) Refactor in preparation for YARN-8696

2018-08-23 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590909#comment-16590909
 ] 

genericqa commented on YARN-8705:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
16s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  2s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
23s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
56s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 82m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8705 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12936896/YARN-8705.v1.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 155ec2c403fa 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ca29fb7 |
| maven | versio

[jira] [Comment Edited] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590876#comment-16590876
 ] 

Craig Condit edited comment on YARN-8638 at 8/23/18 10:22 PM:
--

Current warnings (shouldn't prevent merging IMO):

  - checkstyle warning is due to missing javadoc in a test class

  - javadoc warning is due to maven javadoc plugin not being able to resolve 
{{\{\@value\}}} with references to {{YarnConfiguration}} constants. As this 
javadoc is excluded from public API docs anyway, probably not too critical, and 
several other instances already exist in the same class.


was (Author: ccondit-target):
Current warnings (shouldn't prevent merging IMO):

  - checkstyle warning is due to missing javadoc in a test class

  - javadoc warning is due to maven javadoc plugin not being able to resolve 
{{{@value}}} with references to {{YarnConfiguration}} constants. As this 
javadoc is excluded from public API docs anyway, probably not too critical, and 
several other instances already exist in the same class.

> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch, 
> YARN-8638.003.patch, YARN-8638.004.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590876#comment-16590876
 ] 

Craig Condit commented on YARN-8638:


Current warnings (shouldn't prevent merging IMO):

  - checkstyle warning is due to missing javadoc in a test class

  - javadoc warning is due to maven javadoc plugin not being able to resolve 
{{{@value}}} with references to {{YarnConfiguration}} constants. As this 
javadoc is excluded from public API docs anyway, probably not too critical, and 
several other instances already exist in the same class.

> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch, 
> YARN-8638.003.patch, YARN-8638.004.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590869#comment-16590869
 ] 

genericqa commented on YARN-8638:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  4m 
38s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
59s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 13s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 222 unchanged - 0 fixed = 223 total (was 222) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 45s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
25s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
30s{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager
 generated 1 new + 9 unchanged - 0 fixed = 10 total (was 9) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
44s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
30s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 94m 36s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8638 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12936883/YARN-8638.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 052bf2824442 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ca29fb7 |
| maven | version: Apac

[jira] [Updated] (YARN-8706) DelayedProcessKiller is executed for Docker containers even though docker stop sends a KILL signal after the specified grace period

2018-08-23 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8706:

Description: 
{{DockerStopCommand}} adds a grace period of 10 seconds.

10 seconds is also the default grace time use by docker stop
 [https://docs.docker.com/engine/reference/commandline/stop/]

Documentation of the docker stop:
{quote}the main process inside the container will receive {{SIGTERM}}, and 
after a grace period, {{SIGKILL}}.
{quote}
There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes 
for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By default 
this is set to {{250 milliseconds}} and so irrespective of the container type, 
it will always get executed.
 
For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}} 
after the grace period
- when sleepDelayBeforeSigKill > 10 seconds, then there is no point of 
executing DelayedProcessKiller
- when sleepDelayBeforeSigKill < 1 second, then the grace period should be the 
smallest value, which is 1 second, because anyways we are forcing kill after 
250 ms

 

  was:
{{DockerStopCommand}} adds a grace period of 10 seconds.

10 seconds is also the default grace time use by docker stop
 [https://docs.docker.com/engine/reference/commandline/stop/]

Documentation of the docker stop:
{quote}the main process inside the container will receive {{SIGTERM}}, and 
after a grace period, {{SIGKILL}}.
{quote}
There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes 
for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By default 
this is set to {{250 milliseconds}} and so irrespective of the container type, 
it will always get executed.
 
For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}} 
after the grace period
- when sleepDelayBeforeSigKill > 10 seconds, then there is no point of 
executing DelayedProcessKiller
- when sleepDelayBeforeSigKill < 1 second, then the grace period should be 
least, which is 1 second, because anyways we are forcing kill after 250 ms

 


> DelayedProcessKiller is executed for Docker containers even though docker 
> stop sends a KILL signal after the specified grace period
> ---
>
> Key: YARN-8706
> URL: https://issues.apache.org/jira/browse/YARN-8706
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: docker
>
> {{DockerStopCommand}} adds a grace period of 10 seconds.
> 10 seconds is also the default grace time use by docker stop
>  [https://docs.docker.com/engine/reference/commandline/stop/]
> Documentation of the docker stop:
> {quote}the main process inside the container will receive {{SIGTERM}}, and 
> after a grace period, {{SIGKILL}}.
> {quote}
> There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes 
> for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By 
> default this is set to {{250 milliseconds}} and so irrespective of the 
> container type, it will always get executed.
>  
> For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}} 
> after the grace period
> - when sleepDelayBeforeSigKill > 10 seconds, then there is no point of 
> executing DelayedProcessKiller
> - when sleepDelayBeforeSigKill < 1 second, then the grace period should be 
> the smallest value, which is 1 second, because anyways we are forcing kill 
> after 250 ms
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8706) DelayedProcessKiller is executed for Docker containers even though docker stop sends a KILL signal after the specified grace period

2018-08-23 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8706:

Description: 
{{DockerStopCommand}} adds a grace period of 10 seconds.

10 seconds is also the default grace time use by docker stop
 [https://docs.docker.com/engine/reference/commandline/stop/]

Documentation of the docker stop:
{quote}the main process inside the container will receive {{SIGTERM}}, and 
after a grace period, {{SIGKILL}}.
{quote}
There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes 
for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By default 
this is set to {{250 milliseconds}} and so irrespective of the container type, 
it will always get executed.
 
For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}} 
after the grace period
- when sleepDelayBeforeSigKill > 10 seconds, then there is no point of 
executing DelayedProcessKiller
- when sleepDelayBeforeSigKill < 1 second, then the grace period should be 
least, which is 1 second, because anyways we are forcing kill after 250 ms

 

  was:
{{DockerStopCommand}} adds a grace period of 10 seconds.

10 seconds is also the default grace time use by docker stop
 [https://docs.docker.com/engine/reference/commandline/stop/]

Documentation of the docker stop:
{quote}the main process inside the container will receive {{SIGTERM}}, and 
after a grace period, {{SIGKILL}}.
{quote}
There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes 
for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By default 
this is set to {{250 milliseconds}} and so irrespective of the container type, 
it will get always get executed.
 
For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}} 
after the grace period, so having {{DelayedProcessKiller}} seems redundant.


> DelayedProcessKiller is executed for Docker containers even though docker 
> stop sends a KILL signal after the specified grace period
> ---
>
> Key: YARN-8706
> URL: https://issues.apache.org/jira/browse/YARN-8706
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: docker
>
> {{DockerStopCommand}} adds a grace period of 10 seconds.
> 10 seconds is also the default grace time use by docker stop
>  [https://docs.docker.com/engine/reference/commandline/stop/]
> Documentation of the docker stop:
> {quote}the main process inside the container will receive {{SIGTERM}}, and 
> after a grace period, {{SIGKILL}}.
> {quote}
> There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes 
> for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By 
> default this is set to {{250 milliseconds}} and so irrespective of the 
> container type, it will always get executed.
>  
> For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}} 
> after the grace period
> - when sleepDelayBeforeSigKill > 10 seconds, then there is no point of 
> executing DelayedProcessKiller
> - when sleepDelayBeforeSigKill < 1 second, then the grace period should be 
> least, which is 1 second, because anyways we are forcing kill after 250 ms
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8705) Refactor in preparation for YARN-8696

2018-08-23 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8705:
---
Attachment: YARN-8705.v1.patch

> Refactor in preparation for YARN-8696
> -
>
> Key: YARN-8705
> URL: https://issues.apache.org/jira/browse/YARN-8705
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8705.v1.patch
>
>
> Refactor the UAM heartbeat thread as well as call back method in preparation 
> for YARN-8696 FederationInterceptor upgrade



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8705) Refactor in preparation for YARN-8696

2018-08-23 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8705:
---
Attachment: (was: YARN-8705.v1.patch)

> Refactor in preparation for YARN-8696
> -
>
> Key: YARN-8705
> URL: https://issues.apache.org/jira/browse/YARN-8705
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
>
> Refactor the UAM heartbeat thread as well as call back method in preparation 
> for YARN-8696 FederationInterceptor upgrade



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8705) Refactor in preparation for YARN-8696

2018-08-23 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8705:
---
Attachment: YARN-8705.v1.patch

> Refactor in preparation for YARN-8696
> -
>
> Key: YARN-8705
> URL: https://issues.apache.org/jira/browse/YARN-8705
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8705.v1.patch
>
>
> Refactor the UAM heartbeat thread as well as call back method in preparation 
> for YARN-8696 FederationInterceptor upgrade



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8706) DelayedProcessKiller is executed for Docker containers even though docker stop sends a KILL signal after the specified grace period

2018-08-23 Thread Chandni Singh (JIRA)
Chandni Singh created YARN-8706:
---

 Summary: DelayedProcessKiller is executed for Docker containers 
even though docker stop sends a KILL signal after the specified grace period
 Key: YARN-8706
 URL: https://issues.apache.org/jira/browse/YARN-8706
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chandni Singh
Assignee: Chandni Singh


{{DockerStopCommand}} adds a grace period of 10 seconds.

10 seconds is also the default grace time use by docker stop
 [https://docs.docker.com/engine/reference/commandline/stop/]

Documentation of the docker stop:
{quote}the main process inside the container will receive {{SIGTERM}}, and 
after a grace period, {{SIGKILL}}.
{quote}
There is a {{DelayedProcessKiller}} in {{ContainerExcecutor}} which executes 
for all containers after a delay when {{sleepDelayBeforeSigKill>0}}. By default 
this is set to {{250 milliseconds}} and so irrespective of the container type, 
it will get always get executed.
 
For a docker container, {{docker stop}} takes care of sending a {{SIGKILL}} 
after the grace period, so having {{DelayedProcessKiller}} seems redundant.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service

2018-08-23 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590794#comment-16590794
 ] 

genericqa commented on YARN-8675:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
34s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 75m 34s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8675 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12936871/YARN-8675.1.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 50e256b437b5 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ca29fb7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21668/testReport/ |
| Max. process+thread count | 301 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21668/console |
| Powered by | Apache Yetus 0.8

[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590762#comment-16590762
 ] 

genericqa commented on YARN-8638:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
16s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 43s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m  
3s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 26s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 4 new + 223 unchanged - 0 fixed = 227 total (was 223) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
11s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
39s{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager
 generated 1 new + 9 unchanged - 0 fixed = 10 total (was 9) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
57s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m  
6s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
57s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}102m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8638 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12936864/YARN-8638.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8dc506e639ea 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 585ebd8 |
| maven | version: Apa

[jira] [Commented] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service

2018-08-23 Thread Suma Shivaprasad (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590706#comment-16590706
 ] 

Suma Shivaprasad commented on YARN-8675:


Attached patch which sets hostname if specified via env or if the network is 
not host. This should work for all cases -m R, Spark and Yarn Service AM which 
sets custom hostname . 
Thanks [~eyang] [~shaneku...@gmail.com] [~billie.rinaldi] 

> Setting hostname of docker container breaks with "host" networking mode for 
> Apps which do not run as a YARN service
> ---
>
> Key: YARN-8675
> URL: https://issues.apache.org/jira/browse/YARN-8675
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8675.1.patch
>
>
> Applications like the Spark AM currently do not run as a YARN service and 
> setting hostname breaks driver/executor communication if docker version 
> >=1.13.1 , especially with wire-encryption turned on.
> YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could 
> have a mix of YARN service/native Applications.
> The proposal is to not set the hostname when "host" networking mode is 
> enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service

2018-08-23 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8675:
---
Attachment: YARN-8675.1.patch

> Setting hostname of docker container breaks with "host" networking mode for 
> Apps which do not run as a YARN service
> ---
>
> Key: YARN-8675
> URL: https://issues.apache.org/jira/browse/YARN-8675
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8675.1.patch
>
>
> Applications like the Spark AM currently do not run as a YARN service and 
> setting hostname breaks driver/executor communication if docker version 
> >=1.13.1 , especially with wire-encryption turned on.
> YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could 
> have a mix of YARN service/native Applications.
> The proposal is to not set the hostname when "host" networking mode is 
> enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8705) Refactor in preparation for YARN-8696

2018-08-23 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8705:
---
Issue Type: Sub-task  (was: Task)
Parent: YARN-5597

> Refactor in preparation for YARN-8696
> -
>
> Key: YARN-8705
> URL: https://issues.apache.org/jira/browse/YARN-8705
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
>
> Refactor the UAM heartbeat thread as well as call back method in preparation 
> for YARN-8696 FederationInterceptor upgrade



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8705) Refactor in preparation for YARN-8696

2018-08-23 Thread Botong Huang (JIRA)
Botong Huang created YARN-8705:
--

 Summary: Refactor in preparation for YARN-8696
 Key: YARN-8705
 URL: https://issues.apache.org/jira/browse/YARN-8705
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Botong Huang
Assignee: Botong Huang


Refactor the UAM heartbeat thread as well as call back method in preparation 
for YARN-8696 FederationInterceptor upgrade



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8697) LocalityMulticastAMRMProxyPolicy should fallback to random sub-cluster when cannot resolve resource

2018-08-23 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590663#comment-16590663
 ] 

genericqa commented on YARN-8697:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 29s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 44s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
26s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 54m 43s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8697 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12936861/YARN-8697.v2.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9fed8db72d56 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 585ebd8 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21666/testReport/ |
| Max. process+thread count | 301 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21666/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> LocalityMulticastAMRMProxyPolicy should fallback to r

[jira] [Commented] (YARN-8704) Improve the error message for an invalid docker rw mount to be more informative

2018-08-23 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590639#comment-16590639
 ] 

Eric Yang commented on YARN-8704:
-

If I recall correctly, the message does print the path that is invalid for 
debugging.  Unless the source path is empty string, then it doesn't print out.

> Improve the error message for an invalid docker rw mount to be more 
> informative
> ---
>
> Key: YARN-8704
> URL: https://issues.apache.org/jira/browse/YARN-8704
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Weiwei Yang
>Priority: Minor
>
> Seeing following error message while starting a privileged docker container
> {noformat}
> Error constructing docker command, docker error code=14, error 
> message='Invalid docker read-write mount'
> {noformat}
> it would be good if it tells us which mount is invalid and how to fix it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8704) Improve the error message for an invalid docker rw mount to be more informative

2018-08-23 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590633#comment-16590633
 ] 

Shane Kumpf commented on YARN-8704:
---

[~cheersyang] - Thanks for reporting this. Can you share a bit more detail on 
which log you saw the error in? It would also be helpful if you could share the 
log entries surrounding the error as well. There should be a log entry calling 
out the problematic mount, so I'm curious if it was just overlooked.

> Improve the error message for an invalid docker rw mount to be more 
> informative
> ---
>
> Key: YARN-8704
> URL: https://issues.apache.org/jira/browse/YARN-8704
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Weiwei Yang
>Priority: Minor
>
> Seeing following error message while starting a privileged docker container
> {noformat}
> Error constructing docker command, docker error code=14, error 
> message='Invalid docker read-write mount'
> {noformat}
> it would be good if it tells us which mount is invalid and how to fix it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590587#comment-16590587
 ] 

Eric Yang commented on YARN-8638:
-

[~ccondit-target] Can {@link #STATIC_NON_PRIMITIVE_FIELD} work as a substitute? 
 The entire node manager javadoc are excluded from public API javadoc.  Javadoc 
problem is not important, but it is good to have a clean check list.  It will 
reduce work for future clean up.

[~jlowe] I was thinking if the access can be enforced using top level java 
security package.access option to make sure that system admin can lock down the 
pluggable runtime somehow for security reasons.  In case if someone found a way 
to fool {yarn.nodemanager.runtime.linux.allowed-runtimes} setting.

> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8697) LocalityMulticastAMRMProxyPolicy should fallback to random sub-cluster when cannot resolve resource

2018-08-23 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8697:
---
Attachment: YARN-8697.v2.patch

> LocalityMulticastAMRMProxyPolicy should fallback to random sub-cluster when 
> cannot resolve resource
> ---
>
> Key: YARN-8697
> URL: https://issues.apache.org/jira/browse/YARN-8697
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8697.v1.patch, YARN-8697.v2.patch
>
>
> Right now in LocalityMulticastAMRMProxyPolicy, whenever we cannot resolve the 
> resource name (node or rack), we always route the request to home 
> sub-cluster. However, home sub-cluster might not be always be ready to use 
> (timed out YARN-8581) or enabled (by AMRMProxyPolicy weights). It might also 
> be overwhelmed by the requests if sub-cluster resolver has some issue. In 
> this Jira, we are changing it to pick a random active and enabled sub-cluster 
> for resource request we cannot resolve. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-23 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590570#comment-16590570
 ] 

Yufei Gu commented on YARN-8632:


[~luxianghao], thanks for the patch. Nice finding. +1 for the patch v3. Will 
commit later. Do you need a patch for 2.7?

> No data in file realtimetrack.json after running SchedulerLoadSimulator
> ---
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Assignee: Xianghao Lu
>Priority: Major
> Attachments: YARN-8632-branch-2.7.2.001.patch, YARN-8632.001.patch, 
> YARN-8632.002.patch, YARN-8632.003.patch
>
>
> Recently, I have been using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
> second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590572#comment-16590572
 ] 

Jason Lowe commented on YARN-8638:
--

bq. However, it also can cause security issues, if configuration can be 
override to trigger undesired behavior. I don't know if we need to build more 
security logic here to prevent loading of arbitrary class, but it is worth some 
effort to discuss this up front. 

If the configuration properties and/or classpath are compromised then there are 
far bigger security issues in play than just what container runtime will be 
loaded.  Limiting the plugin to a particular java package will not provide any 
additional security, since someone with malicious intent will craft their class 
with the magic java package prefix just as legitimate users would be forced to 
do so  If the attacker can drop arbitrary jars on the classpath _and_ change 
the configs, there's little limit to what they can compromise since they can 
likely replace existing Hadoop classes wholesale with their own.  A package 
prefix check does not provide any significant additional security, but it will 
frustrate users trying to get their legitimate runtime plugin class to load.


> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7086) Release all containers aynchronously

2018-08-23 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590556#comment-16590556
 ] 

Jason Lowe commented on YARN-7086:
--

bq. I assume you are referring the lock inside LeafQueue#completedContainer().

I was referring to the scheduler back in the 2.7/2.8 code which has changed 
considerably in trunk from that.  Back in 2.7 releasing a container required 
the highly-contended CapacityScheduler lock to be obtained, separately, for 
every container released.  When releasing a lot of containers in a single AM 
heartbeat, this caused a long backup as the highly-contended lock needed to be 
reacquired for every released container.  It would have been far more efficient 
to just grab the lock once and release all the containers with the lock held 
the entire time.

The big CapacityScheduler lock appears to be gone in trunk, so I would expect 
the next level of locking bottleneck to be the LeafQueue lock.

> Release all containers aynchronously
> 
>
> Key: YARN-7086
> URL: https://issues.apache.org/jira/browse/YARN-7086
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Arun Suresh
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-7086.001.patch
>
>
> We have noticed in production two situations that can cause deadlocks and 
> cause scheduling of new containers to come to a halt, especially with regard 
> to applications that have a lot of live containers:
> # When these applicaitons release these containers in bulk.
> # When these applications terminate abruptly due to some failure, the 
> scheduler releases all its live containers in a loop.
> To handle the issues mentioned above, we have a patch in production to make 
> sure ALL container releases happen asynchronously - and it has served us well.
> Opening this JIRA to gather feedback on if this is a good idea generally (cc 
> [~leftnoteasy], [~jlowe], [~curino], [~kasha], [~subru], [~roniburd])
> BTW, In YARN-6251, we already have an asyncReleaseContainer() in the 
> AbstractYarnScheduler and a corresponding scheduler event, which is currently 
> used specifically for the container-update code paths (where the scheduler 
> realeases temp containers which it creates for the update)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8704) Improve the error message for an invalid docker rw mount to be more informative

2018-08-23 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590528#comment-16590528
 ] 

Weiwei Yang commented on YARN-8704:
---

cc [~shaneku...@gmail.com], [~eyang]

> Improve the error message for an invalid docker rw mount to be more 
> informative
> ---
>
> Key: YARN-8704
> URL: https://issues.apache.org/jira/browse/YARN-8704
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Weiwei Yang
>Priority: Minor
>
> Seeing following error message while starting a privileged docker container
> {noformat}
> Error constructing docker command, docker error code=14, error 
> message='Invalid docker read-write mount'
> {noformat}
> it would be good if it tells us which mount is invalid and how to fix it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8704) Improve the error message for an invalid docker rw mount to be more informative

2018-08-23 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8704:
--
Description: 
Seeing following error message while starting a privileged docker container
{noformat}
Error constructing docker command, docker error code=14,
error message='Invalid docker read-write mount'
{noformat}
it would be good if it tells us which mount is invalid and how to fix it.

 

  was:
Seeing following error message while starting a privileged docker container

{noformat}

Error constructing docker command, docker error code=14, error message='Invalid 
docker read-write mount'
{noformat}

it would be good if it tells us which mount is invalid and how to fix it.

 


> Improve the error message for an invalid docker rw mount to be more 
> informative
> ---
>
> Key: YARN-8704
> URL: https://issues.apache.org/jira/browse/YARN-8704
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Weiwei Yang
>Priority: Minor
>
> Seeing following error message while starting a privileged docker container
> {noformat}
> Error constructing docker command, docker error code=14,
> error message='Invalid docker read-write mount'
> {noformat}
> it would be good if it tells us which mount is invalid and how to fix it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8704) Improve the error message for an invalid docker rw mount to be more informative

2018-08-23 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8704:
--
Description: 
Seeing following error message while starting a privileged docker container
{noformat}
Error constructing docker command, docker error code=14, error message='Invalid 
docker read-write mount'
{noformat}
it would be good if it tells us which mount is invalid and how to fix it.

 

  was:
Seeing following error message while starting a privileged docker container
{noformat}
Error constructing docker command, docker error code=14,
error message='Invalid docker read-write mount'
{noformat}
it would be good if it tells us which mount is invalid and how to fix it.

 


> Improve the error message for an invalid docker rw mount to be more 
> informative
> ---
>
> Key: YARN-8704
> URL: https://issues.apache.org/jira/browse/YARN-8704
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Weiwei Yang
>Priority: Minor
>
> Seeing following error message while starting a privileged docker container
> {noformat}
> Error constructing docker command, docker error code=14, error 
> message='Invalid docker read-write mount'
> {noformat}
> it would be good if it tells us which mount is invalid and how to fix it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8704) Improve the error message for an invalid docker rw mount to be more informative

2018-08-23 Thread Weiwei Yang (JIRA)
Weiwei Yang created YARN-8704:
-

 Summary: Improve the error message for an invalid docker rw mount 
to be more informative
 Key: YARN-8704
 URL: https://issues.apache.org/jira/browse/YARN-8704
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.2.0
Reporter: Weiwei Yang


Seeing following error message while starting a privileged docker container

{noformat}

Error constructing docker command, docker error code=14, error message='Invalid 
docker read-write mount'
{noformat}

it would be good if it tells us which mount is invalid and how to fix it.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7086) Release all containers aynchronously

2018-08-23 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590507#comment-16590507
 ] 

Manikandan R edited comment on YARN-7086 at 8/23/18 4:49 PM:
-

Thanks [~asuresh]

Attached .001 patch for early review. It has changes as described in 
https://issues.apache.org/jira/browse/YARN-7086?focusedCommentId=16140295&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16140295.

[~jlowe]

{quote}I think it would be a lot better if there was a bulk-release interface 
so we could grab the critical lock once.{quote}

I assume you are referring the lock inside LeafQueue#completedContainer(). If 
answer is yes, one approach would be doing changes in 
Scheduler#completedContainer(), Scheduler#completedContainerInternal() and 
LeafQueue#completedContainer() to accept list of containers and process 
accordingly as opposed to accepting single container. Currently, All these 
methods accepts single RMContainer and do the operation with respect to that. 
With this new approach, We will need to see how we can able to accept list and 
traverse accordingly. Can you please confirm this?


was (Author: maniraj...@gmail.com):
Attached .001 patch for early review. It has changes as described in 
https://issues.apache.org/jira/browse/YARN-7086?focusedCommentId=16140295&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16140295.

[~jlowe]

{quote}I think it would be a lot better if there was a bulk-release interface 
so we could grab the critical lock once.{quote}

I assume you are referring the lock inside LeafQueue#completedContainer(). If 
answer is yes, one approach would be doing changes in 
Scheduler#completedContainer(), Scheduler#completedContainerInternal() and 
LeafQueue#completedContainer() to accept list of containers and process 
accordingly as opposed to accepting single container. Currently, All these 
methods accepts single RMContainer and do the operation with respect to that. 
With this new approach, We will need to see how we can able to accept list and 
traverse accordingly. Can you please confirm this?

> Release all containers aynchronously
> 
>
> Key: YARN-7086
> URL: https://issues.apache.org/jira/browse/YARN-7086
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Arun Suresh
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-7086.001.patch
>
>
> We have noticed in production two situations that can cause deadlocks and 
> cause scheduling of new containers to come to a halt, especially with regard 
> to applications that have a lot of live containers:
> # When these applicaitons release these containers in bulk.
> # When these applications terminate abruptly due to some failure, the 
> scheduler releases all its live containers in a loop.
> To handle the issues mentioned above, we have a patch in production to make 
> sure ALL container releases happen asynchronously - and it has served us well.
> Opening this JIRA to gather feedback on if this is a good idea generally (cc 
> [~leftnoteasy], [~jlowe], [~curino], [~kasha], [~subru], [~roniburd])
> BTW, In YARN-6251, we already have an asyncReleaseContainer() in the 
> AbstractYarnScheduler and a corresponding scheduler event, which is currently 
> used specifically for the container-update code paths (where the scheduler 
> realeases temp containers which it creates for the update)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7086) Release all containers aynchronously

2018-08-23 Thread Manikandan R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-7086:
---
Attachment: YARN-7086.001.patch

> Release all containers aynchronously
> 
>
> Key: YARN-7086
> URL: https://issues.apache.org/jira/browse/YARN-7086
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Arun Suresh
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-7086.001.patch
>
>
> We have noticed in production two situations that can cause deadlocks and 
> cause scheduling of new containers to come to a halt, especially with regard 
> to applications that have a lot of live containers:
> # When these applicaitons release these containers in bulk.
> # When these applications terminate abruptly due to some failure, the 
> scheduler releases all its live containers in a loop.
> To handle the issues mentioned above, we have a patch in production to make 
> sure ALL container releases happen asynchronously - and it has served us well.
> Opening this JIRA to gather feedback on if this is a good idea generally (cc 
> [~leftnoteasy], [~jlowe], [~curino], [~kasha], [~subru], [~roniburd])
> BTW, In YARN-6251, we already have an asyncReleaseContainer() in the 
> AbstractYarnScheduler and a corresponding scheduler event, which is currently 
> used specifically for the container-update code paths (where the scheduler 
> realeases temp containers which it creates for the update)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7086) Release all containers aynchronously

2018-08-23 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590507#comment-16590507
 ] 

Manikandan R commented on YARN-7086:


Attached .001 patch for early review. It has changes as described in 
https://issues.apache.org/jira/browse/YARN-7086?focusedCommentId=16140295&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16140295.

[~jlowe]

{quote}I think it would be a lot better if there was a bulk-release interface 
so we could grab the critical lock once.{quote}

I assume you are referring the lock inside LeafQueue#completedContainer(). If 
answer is yes, one approach would be doing changes in 
Scheduler#completedContainer(), Scheduler#completedContainerInternal() and 
LeafQueue#completedContainer() to accept list of containers and process 
accordingly as opposed to accepting single container. Currently, All these 
methods accepts single RMContainer and do the operation with respect to that. 
With this new approach, We will need to see how we can able to accept list and 
traverse accordingly. Can you please confirm this?

> Release all containers aynchronously
> 
>
> Key: YARN-7086
> URL: https://issues.apache.org/jira/browse/YARN-7086
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Arun Suresh
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-7086.001.patch
>
>
> We have noticed in production two situations that can cause deadlocks and 
> cause scheduling of new containers to come to a halt, especially with regard 
> to applications that have a lot of live containers:
> # When these applicaitons release these containers in bulk.
> # When these applications terminate abruptly due to some failure, the 
> scheduler releases all its live containers in a loop.
> To handle the issues mentioned above, we have a patch in production to make 
> sure ALL container releases happen asynchronously - and it has served us well.
> Opening this JIRA to gather feedback on if this is a good idea generally (cc 
> [~leftnoteasy], [~jlowe], [~curino], [~kasha], [~subru], [~roniburd])
> BTW, In YARN-6251, we already have an asyncReleaseContainer() in the 
> AbstractYarnScheduler and a corresponding scheduler event, which is currently 
> used specifically for the container-update code paths (where the scheduler 
> realeases temp containers which it creates for the update)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service

2018-08-23 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590499#comment-16590499
 ] 

Eric Yang commented on YARN-8675:
-

[~billie.rinaldi] When net != host, AM must set the hostnames for containers.  
Randomly generated hostname will not work across nodes.

> Setting hostname of docker container breaks with "host" networking mode for 
> Apps which do not run as a YARN service
> ---
>
> Key: YARN-8675
> URL: https://issues.apache.org/jira/browse/YARN-8675
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Major
>  Labels: Docker
>
> Applications like the Spark AM currently do not run as a YARN service and 
> setting hostname breaks driver/executor communication if docker version 
> >=1.13.1 , especially with wire-encryption turned on.
> YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could 
> have a mix of YARN service/native Applications.
> The proposal is to not set the hostname when "host" networking mode is 
> enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590489#comment-16590489
 ] 

Craig Condit commented on YARN-8638:


bq. I think this is the correct way to go and is what we should follow for the 
pluggable runtimes as well. Using {{YARN_CONTAINER_RUNTIME_TYPE}} makes sense 
to me and I agree with your proposed approach.

Perfect. I will prep a new patch and get it uploaded.

As an aside, the javadoc warnings are stumping me. According to everything I 
can find, referencing constants in {{YarnConfiguration}} as we are in 
{\{\{@value\}}} tags should be legal (and in fact seems to work in IDEA), but 
fails when using the maven javadoc plugin. Since we seem to have several other 
occurrences in the same source files, I'm inclined to ignore the additional 
warning for now.


> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590483#comment-16590483
 ] 

Eric Badger commented on YARN-8638:
---

bq. LinuxContainerRuntime is already marked @Private and @Unstable, so I think 
we're covered there. I also agree that requiring implementations to exist 
within "magic" packages is probably counter-productive (more code complexity 
and no real net gain).
I agree with this. The pluggable runtime is a feature here, but I don't think 
it's anything that we need to vow not to break. We can continue to develop the 
interface as we need, and any pluggable runtimes will just need to update when 
they come across those changes. To me, this is the best of both worlds where 
development is not changed on trunk, but there is also a way to plugin an 
arbitrary runtime that you control (and can modify in the cases where the 
interface changes).

bq. Perhaps we can have the implementation return true if 
YARN_CONTAINER_RUNTIME_TYPE is either unset or explicitly "default"?
Yea this sounds good to me. I want to make sure that we're consistent what what 
happens when a user asks for a specific runtime and it isn't allowed. The 
current docker behavior is that we will fail the job instead of falling back to 
default. I think this is the correct way to go and is what we should follow for 
the pluggable runtimes as well. Using {{YARN_CONTAINER_RUNTIME_TYPE}} makes 
sense to me and I agree with your proposed approach.

> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590477#comment-16590477
 ] 

Eric Yang commented on YARN-8638:
-

[~jlowe] {quote}
Unless I'm missing something, the whole point of a pluggable interface in this 
case is to enable runtimes that exist outside of the Apache Hadoop code base. 
There's already a separate property that controls what runtimes are allowed, so 
this plugin support is only for loading classes that aren't known by Hadoop 
when it was compiled. Limiting the classes that will be loaded to a specific 
java package prefix seems arbitrary and won't accomplish the desired effect in 
practice.{quote}

Hadoop made the scheduler pluggable in 2008, and we are still maintaining 
capacity scheduler, and fair scheduler.  This provide a long last development 
visibility for both model.  However, it also can cause security issues, if 
configuration can be override to trigger undesired behavior.  I don't know if 
we need to build more security logic here to prevent loading of arbitrary 
class, but it is worth some effort to discuss this up front. 

> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service

2018-08-23 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590459#comment-16590459
 ] 

Billie Rinaldi commented on YARN-8675:
--

I guess a remaining question is whether we want the runtime to set a default 
hostname when net != host. When net != host, the container would have a random 
hostname anyway, so perhaps it wouldn't be a problem to have the runtime 
continue setting a containerID-based hostname.

> Setting hostname of docker container breaks with "host" networking mode for 
> Apps which do not run as a YARN service
> ---
>
> Key: YARN-8675
> URL: https://issues.apache.org/jira/browse/YARN-8675
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Major
>  Labels: Docker
>
> Applications like the Spark AM currently do not run as a YARN service and 
> setting hostname breaks driver/executor communication if docker version 
> >=1.13.1 , especially with wire-encryption turned on.
> YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could 
> have a mix of YARN service/native Applications.
> The proposal is to not set the hostname when "host" networking mode is 
> enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service

2018-08-23 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590436#comment-16590436
 ] 

Billie Rinaldi commented on YARN-8675:
--

i think we're conflating two problems, 1) whether we should allow hostname to 
be set when net=host and 2) whether the docker runtime should set a default 
hostname for the container if the AM does not set one. The source of the issue 
is that the runtime is setting a default hostname and it is not populating the 
registry with that hostname to enable DNS lookups.

If we remove the default hostname, then we are delegating the behavior to the 
AM. The AM can set a hostname or not, and if the app needs DNS for a special 
hostname, the AM will have to populate the registry itself. This will preserve 
the current working behavior for the service AM and fix the behavior for other 
AMs that do not set a hostname.

> Setting hostname of docker container breaks with "host" networking mode for 
> Apps which do not run as a YARN service
> ---
>
> Key: YARN-8675
> URL: https://issues.apache.org/jira/browse/YARN-8675
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Major
>  Labels: Docker
>
> Applications like the Spark AM currently do not run as a YARN service and 
> setting hostname breaks driver/executor communication if docker version 
> >=1.13.1 , especially with wire-encryption turned on.
> YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could 
> have a mix of YARN service/native Applications.
> The proposal is to not set the hostname when "host" networking mode is 
> enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590430#comment-16590430
 ] 

Craig Condit edited comment on YARN-8638 at 8/23/18 4:12 PM:
-

bq. As I see it, the concern here is whether we're ready to mark parts or all 
of the container runtime interface as stable. If we're not then let's just 
admit that, mark the interface as Private, Unstable, Evolving, whatever, and 
note that there's no guarantees that plugins will work as-is from release to 
release for a while until we are ready to mark it Public, Stable, etc.

{{LinuxContainerRuntime}} is already marked {{\@Private}} and {{\@Unstable}}, 
so I think we're covered there. I also agree that requiring implementations to 
exist within "magic" packages is probably counter-productive (more code 
complexity and no real net gain).

[~ebadger], I think I see where you're coming from with 
{{DefaultLinuxContainerRuntime}}'s implementation of 
{{isRuntimeRequested()}}... Perhaps we can have the implementation return true 
if {{YARN_CONTAINER_RUNTIME_TYPE}} is either unset or explicitly "default"? I 
believe this would be more in line with the original intent but also handle 
properly the case where a user requests a particular runtime which has been 
disallowed by an administrator (failing rather than just falling back).


was (Author: ccondit-target):
bq. As I see it, the concern here is whether we're ready to mark parts or all 
of the container runtime interface as stable. If we're not then let's just 
admit that, mark the interface as Private, Unstable, Evolving, whatever, and 
note that there's no guarantees that plugins will work as-is from release to 
release for a while until we are ready to mark it Public, Stable, etc.

{{LinuxContainerRuntime}} is already marked {{\@Private}} and {{\@Unstable}}, 
so I think we're covered there. I also agree that requiring implementations to 
exist within "magic" packages is probably counter-productive (more code 
complexity and no real net gain).

[~ebadger], I think I see where you're coming from with 
{{DefaultLinuxContainerRuntime}}'s implementation of 
{{isRuntimeRequested()}}... Perhaps we can have the implementation return true 
if {YARN_CONTAINER_RUNTIME_TYPE} is either unset or explicitly "default"? I 
believe this would be more in line with the original intent but also handle 
properly the case where a user requests a particular runtime which has been 
disallowed by an administrator (failing rather than just falling back).

> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before

[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590430#comment-16590430
 ] 

Craig Condit commented on YARN-8638:


bq. As I see it, the concern here is whether we're ready to mark parts or all 
of the container runtime interface as stable. If we're not then let's just 
admit that, mark the interface as Private, Unstable, Evolving, whatever, and 
note that there's no guarantees that plugins will work as-is from release to 
release for a while until we are ready to mark it Public, Stable, etc.

{{LinuxContainerRuntime}} is already marked {{\@Private}} and {{\@Unstable}}, 
so I think we're covered there. I also agree that requiring implementations to 
exist within "magic" packages is probably counter-productive (more code 
complexity and no real net gain).

[~ebadger], I think I see where you're coming from with 
{{DefaultLinuxContainerRuntime}}'s implementation of 
{{isRuntimeRequested()}}... Perhaps we can have the implementation return true 
if {YARN_CONTAINER_RUNTIME_TYPE} is either unset or explicitly "default"? I 
believe this would be more in line with the original intent but also handle 
properly the case where a user requests a particular runtime which has been 
disallowed by an administrator (failing rather than just falling back).

> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers

2018-08-23 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590423#comment-16590423
 ] 

Eric Badger commented on YARN-6456:
---

[~ccondit-target], I attached a patch of what we're using to force the 
{{DockerLinuxContainerRuntime}} on nodes that have docker installed. This 
allows rhel6 nodes to continue operating as normal (as rhel6 doesn't have 
docker), while rhel7 nodes will use docker. I would think that this could be 
pretty easily extended to override a single runtime via a config. And then you 
can also override nodes that have docker installed on them to not use docker by 
simply removing docker from the allowed-runtimes in the yarn configuration for 
that node.

> Allow administrators to set a single ContainerRuntime for all containers
> 
>
> Key: YARN-6456
> URL: https://issues.apache.org/jira/browse/YARN-6456
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Craig Condit
>Priority: Major
>  Labels: Docker
> Attachments: YARN-6456-ForceDockerRuntimeIfSupported.patch
>
>
>  
> With LCE, there are multiple ContainerRuntimes available for handling 
> different types of containers; default, docker, java sandbox. Admins should 
> have the ability to override the user decision and set a single global 
> ContainerRuntime to be used for all containers.
> Original Description:
> {quote}One reason to use Docker containers is to be able to isolate different 
> workloads, even, if they run as the same user.
> I have noticed some issues in the current design:
>  1. DockerLinuxContainerRuntime mounts containerLocalDirs 
> {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
> userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see 
> and modify the files of another container. I think the application file cache 
> directory should be enough for the container to run in most of the cases.
>  2. The whole cgroups directory is mounted. Would the container directory be 
> enough?
>  3. There is no way to enforce exclusive use of Docker for all containers. 
> There should be an option that it is not the user but the admin that requires 
> to use Docker.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590416#comment-16590416
 ] 

Jason Lowe edited comment on YARN-8638 at 8/23/18 4:01 PM:
---

bq. It would be better if we only allow loading of container runtime from the 
current package locations and 
org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime only.

Unless I'm missing something, the whole point of a pluggable interface in this 
case is to enable runtimes that exist outside of the Apache Hadoop code base.  
There's already a separate property that controls what runtimes are allowed, so 
this plugin support is only for loading classes that aren't known by Hadoop 
when it was compiled.  Limiting the classes that will be loaded to a specific 
java package prefix seems arbitrary and won't accomplish the desired effect in 
practice.  If we change the interface then users who have custom plugins will 
break whether the java package is "correct" or not.

As I see it, the concern here is whether we're ready to mark parts or all of 
the container runtime interface as stable.  If we're not then let's just admit 
that, mark the interface as Private, Unstable, Evolving, whatever, and note 
that there's no guarantees that plugins will work as-is from release to release 
for a while until we are ready to mark it Public, Stable, etc.  Then people who 
want to live on the edge to try out new things, fully knowing they may have to 
rewrite parts of it when moving to new releases, can do so easily without 
shoehorning their plugin into an arbitrary java package prefix.



was (Author: jlowe):
Unless I'm missing something, the whole point of a pluggable interface in this 
case is to enable runtimes that exist outside of the Apache Hadoop code base.  
There's already a separate property that controls what runtimes are allowed, so 
this plugin support is only for loading classes that aren't known by Hadoop 
when it was compiled.  Limiting the classes that will be loaded to a specific 
java package prefix seems arbitrary and won't accomplish the desired effect in 
practice.  If we change the interface then users who have custom plugins will 
break whether the java package is "correct" or not.

As I see it, the concern here is whether we're ready to mark parts or all of 
the container runtime interface as stable.  If we're not then let's just admit 
that, mark the interface as Private, Unstable, Evolving, whatever, and note 
that there's no guarantees that plugins will work as-is from release to release 
for a while until we are ready to mark it Public, Stable, etc.  Then people who 
want to live on the edge to try out new things, fully knowing they may have to 
rewrite parts of it when moving to new releases, can do so easily without 
shoehorning their plugin into an arbitrary java package prefix.


> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the

[jira] [Updated] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers

2018-08-23 Thread Eric Badger (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6456:
--
Attachment: YARN-6456-ForceDockerRuntimeIfSupported.patch

> Allow administrators to set a single ContainerRuntime for all containers
> 
>
> Key: YARN-6456
> URL: https://issues.apache.org/jira/browse/YARN-6456
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Craig Condit
>Priority: Major
>  Labels: Docker
> Attachments: YARN-6456-ForceDockerRuntimeIfSupported.patch
>
>
>  
> With LCE, there are multiple ContainerRuntimes available for handling 
> different types of containers; default, docker, java sandbox. Admins should 
> have the ability to override the user decision and set a single global 
> ContainerRuntime to be used for all containers.
> Original Description:
> {quote}One reason to use Docker containers is to be able to isolate different 
> workloads, even, if they run as the same user.
> I have noticed some issues in the current design:
>  1. DockerLinuxContainerRuntime mounts containerLocalDirs 
> {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
> userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see 
> and modify the files of another container. I think the application file cache 
> directory should be enough for the container to run in most of the cases.
>  2. The whole cgroups directory is mounted. Would the container directory be 
> enough?
>  3. There is no way to enforce exclusive use of Docker for all containers. 
> There should be an option that it is not the user but the admin that requires 
> to use Docker.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590416#comment-16590416
 ] 

Jason Lowe commented on YARN-8638:
--

Unless I'm missing something, the whole point of a pluggable interface in this 
case is to enable runtimes that exist outside of the Apache Hadoop code base.  
There's already a separate property that controls what runtimes are allowed, so 
this plugin support is only for loading classes that aren't known by Hadoop 
when it was compiled.  Limiting the classes that will be loaded to a specific 
java package prefix seems arbitrary and won't accomplish the desired effect in 
practice.  If we change the interface then users who have custom plugins will 
break whether the java package is "correct" or not.

As I see it, the concern here is whether we're ready to mark parts or all of 
the container runtime interface as stable.  If we're not then let's just admit 
that, mark the interface as Private, Unstable, Evolving, whatever, and note 
that there's no guarantees that plugins will work as-is from release to release 
for a while until we are ready to mark it Public, Stable, etc.  Then people who 
want to live on the edge to try out new things, fully knowing they may have to 
rewrite parts of it when moving to new releases, can do so easily without 
shoehorning their plugin into an arbitrary java package prefix.


> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590412#comment-16590412
 ] 

Eric Badger commented on YARN-8638:
---

bq. Default's logic is as before, but since pluggable runtimes get a chance to 
intercept before default, isRuntimeRequested() will only get called in the same 
circumstances where the original logic was. Is there a better implementation 
for isRuntimeRequested()?
Yes, this is definitely backwards compatible and as is everything works as we 
want it to. But in future development, if someone were to use this method, they 
would assume that calling {{isRuntimeRequested()}} on the default runtime would 
let them know if that's the runtime that was requested. With this logic, that 
is not true. IMO, {{isRuntimeRequested()}} should be able to tell you 
standalone whether that runtime is requested or not. If we want to shortcut it 
by using the fall-through logic of the if statements, that's fine. But then I 
would just say not to call this method and have the method be true to its name. 
In reality, once this patch goes in, default runtime is only requested if we 
don't ask for Docker _and_ we don't ask for a pluggable runtime. 

> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes

2018-08-23 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590410#comment-16590410
 ] 

Naganarasimha G R commented on YARN-7863:
-

Thanks [~cheersyang] for the detailed clarification,
{quote} PCs are not associated with resource, so it's like an extra check after 
all other checks are done. Scheduler still calculates how much resource 
available in a partition for a given queue, assign resource from a node in this 
partition to a request, but if PC is not satisfied then the allocation proposal 
will be rejected. et
{quote}
Agree PC's are not directly involved with resource, But IIUC with the new API 
:_SchedulingRequest_ for allocation of containers has only PC as the way to 
specify partitions which is tied with resource . This is the reason i had my 
concerns with the OR. Let me go through the code with new API and understand 
whether Partition is handled for all scenarios. I was always under the 
impression that there will be a constraint explicitly for the Partition based 
on which resources are accounted ( USED, PENDING , RESERVED etc...). But if its 
internal to the constraint  which is also OR'd with others i am not completely 
sure its addressing .
{quote}Partition in PC is not ready, to be honest, I am not sure if everything 
is align with existing label-based scheduling. I suggested in YARN-8015 to open 
a separate task for further enhance that.
{quote}
Let me too try to understand the PC and new api modifications in detail and 
then share my feedback in that Jira and anyway its out of the scope for this 
jira.

 

 

> Modify placement constraints to support node attributes
> ---
>
> Key: YARN-7863
> URL: https://issues.apache.org/jira/browse/YARN-7863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-7863-YARN-3409.002.patch, 
> YARN-7863-YARN-3409.003.patch, YARN-7863-YARN-3409.004.patch, 
> YARN-7863-YARN-3409.005.patch, YARN-7863-YARN-3409.006.patch, 
> YARN-7863-YARN-3409.007.patch, YARN-7863-YARN-3409.008.patch, 
> YARN-7863.v0.patch
>
>
> This Jira will track to *Modify existing placement constraints to support 
> node attributes.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590403#comment-16590403
 ] 

Craig Condit commented on YARN-8638:


[~ebadger], isRuntimeRequested() was added to allow container runtimes to 
assert themselves in a controlled manner. To maintain backwards compatiblity, 
search order is javasandbox, docker, \{pluggable runtimes}, default. Default's 
logic is as before, but since pluggable runtimes get a chance to intercept 
before default, isRuntimeRequested() will only get called in the same 
circumstances where the original logic was. Is there a better implementation 
for isRuntimeRequested()?

> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590390#comment-16590390
 ] 

Eric Yang commented on YARN-8638:
-

[~leftnoteasy] The purpose of this JIRA is to swap container run time with 
another implementation of container run time.  For stability and prevent 
duplication of the same work that ContainerLaunch and ContainerExecutor.  The 
current approach looks like a good way to create a pluggable interface for 
container runtime.  [~ccondit-target], It would be better if we only allow 
loading of container runtime from the current package locations and 
org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime only.  This 
ensure that changing of interface would be visible to the community, and we are 
not held liable for changing interface that might impact proprietary technology.

I am ok to commit this, if javadoc and package loading location are fixed.

> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590376#comment-16590376
 ] 

Eric Badger commented on YARN-8638:
---

bq. It will be very clean if we can make runc/containerd to be a separate 
ContainerRuntime implementation. But not sure that if all the common logics 
like ContainerLaunch/LinuxContainerExecutor works fine for containerd/runc. If 
involved changes required, we may have to consider to move the abstraction to 
ContainerExecutor level, etc.
I think that the only thing that we will need to change outside of the runtime 
is the Docker lifecycle changes that were made in YARN-5366. 

{noformat}
+  @Override
+  public boolean isRuntimeRequested(Map env) {
+return !DockerLinuxContainerRuntime.isDockerContainerRequested(env);
+  }
+
{noformat}
I understand why you made this change, since it's just replacing the code that 
was already there. But that logic only worked because of the control flow where 
we knew if the user didn't explicitly ask for docker that they would be asking 
for default. But now we have to ask if they're asking for docker or a pluggable 
runtime. So the function return value doesn't really make sense outside of the 
logic that its being used in right now. 

Other than that, the code lgtm

> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8703) Localized resource may leak on disk if container is killed while localizing

2018-08-23 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590347#comment-16590347
 ] 

Jason Lowe commented on YARN-8703:
--

The ResourceLocalizedEvent has a local path, so it looks like we can use that 
to issue a delete request to the container executor to remove the localized 
resource the NM is no longer tracking.

> Localized resource may leak on disk if container is killed while localizing
> ---
>
> Key: YARN-8703
> URL: https://issues.apache.org/jira/browse/YARN-8703
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Jason Lowe
>Priority: Major
>
> If a container is killed while localizing then it releases all of its 
> resources.  If the resource count goes to zero and it is in the DOWNLOADING 
> state then the resource bookkeeping is removed in the resource tracker.  
> Shortly afterwards the localizer could heartbeat in and report the successful 
> localization of the resource that was just removed.  When the 
> LocalResourcesTrackerImpl receives the LOCALIZED event but does not find the 
> corresponding LocalResource for the event then it simply logs a "localized 
> without a location" warning.  At that point I think the localized resource 
> has been leaked on the disk since the NM has removed bookkeeping for the 
> resource without removing it on disk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8649) NPE in localizer hearbeat processing if a container is killed while localizing

2018-08-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590345#comment-16590345
 ] 

Hudson commented on YARN-8649:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14820 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14820/])
YARN-8649. NPE in localizer hearbeat processing if a container is killed 
(jlowe: rev 585ebd873a55bedd2a364d256837f08ada8ba032)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java


> NPE in localizer hearbeat processing if a container is killed while localizing
> --
>
> Key: YARN-8649
> URL: https://issues.apache.org/jira/browse/YARN-8649
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 2.9.2, 2.8.5, 3.0.4, 3.1.2
>
> Attachments: YARN-8649.patch, YARN-8649_2.patch, YARN-8649_3.patch, 
> YARN-8649_4.patch, YARN-8649_5.patch, hadoop-hires-nodemanager-hadoop11.log
>
>
> I have noticed that a nodemanager was getting NPEs while tearing down. The 
> reason maybe  similar to YARN-4355 which is reported by [# Jason Lowe]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590315#comment-16590315
 ] 

Craig Condit commented on YARN-8638:


[~leftnoteasy], I believe runc support (especially rootless containers) should 
be mostly (if not entirely) possible to implement within the scope of the 
existing ContainerRuntime interface. If that proves to be incorrect, we could 
add additional support elsewhere in a future JIRA.

> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8649) NPE in localizer hearbeat processing if a container is killed while localizing

2018-08-23 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-8649:
-
Summary: NPE in localizer hearbeat processing if a container is killed 
while localizing  (was: Similar as YARN-4355:NPE while processing localizer 
heartbeat)

> NPE in localizer hearbeat processing if a container is killed while localizing
> --
>
> Key: YARN-8649
> URL: https://issues.apache.org/jira/browse/YARN-8649
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Attachments: YARN-8649.patch, YARN-8649_2.patch, YARN-8649_3.patch, 
> YARN-8649_4.patch, YARN-8649_5.patch, hadoop-hires-nodemanager-hadoop11.log
>
>
> I have noticed that a nodemanager was getting NPEs while tearing down. The 
> reason maybe  similar to YARN-4355 which is reported by [# Jason Lowe]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8649) Similar as YARN-4355:NPE while processing localizer heartbeat

2018-08-23 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590302#comment-16590302
 ] 

Jason Lowe commented on YARN-8649:
--

Thanks for updating the patch!  +1 lgtm.  While reviewing it looks like there's 
a window where we can leak resources on the local disk if containers are killed 
while localizing and we don't get around to getting the localizer killed before 
it finishes localizing a resource.  Filed YARN-8703 to track that separately.

Committing this.


> Similar as YARN-4355:NPE while processing localizer heartbeat
> -
>
> Key: YARN-8649
> URL: https://issues.apache.org/jira/browse/YARN-8649
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Attachments: YARN-8649.patch, YARN-8649_2.patch, YARN-8649_3.patch, 
> YARN-8649_4.patch, YARN-8649_5.patch, hadoop-hires-nodemanager-hadoop11.log
>
>
> I have noticed that a nodemanager was getting NPEs while tearing down. The 
> reason maybe  similar to YARN-4355 which is reported by [# Jason Lowe]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-23 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590307#comment-16590307
 ] 

Craig Condit commented on YARN-8638:


[~csingh], the intent with this patch is to be as simple and unobtrusive as 
possible, so no, there wouldn't be a discovery mechanism, just a jar on the 
classpath.

 

> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Assignee: Craig Condit
>Priority: Minor
> Attachments: YARN-8638.001.patch, YARN-8638.002.patch
>
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8703) Localized resource may leak on disk if container is killed while localizing

2018-08-23 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-8703:


 Summary: Localized resource may leak on disk if container is 
killed while localizing
 Key: YARN-8703
 URL: https://issues.apache.org/jira/browse/YARN-8703
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Jason Lowe


If a container is killed while localizing then it releases all of its 
resources.  If the resource count goes to zero and it is in the DOWNLOADING 
state then the resource bookkeeping is removed in the resource tracker.  
Shortly afterwards the localizer could heartbeat in and report the successful 
localization of the resource that was just removed.  When the 
LocalResourcesTrackerImpl receives the LOCALIZED event but does not find the 
corresponding LocalResource for the event then it simply logs a "localized 
without a location" warning.  At that point I think the localized resource has 
been leaked on the disk since the NM has removed bookkeeping for the resource 
without removing it on disk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7680) ContainerMetrics is registered even if yarn.nodemanager.container-metrics.enable is set to false

2018-08-23 Thread Zoltan Siegl (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590235#comment-16590235
 ] 

Zoltan Siegl edited comment on YARN-7680 at 8/23/18 2:04 PM:
-

Hey [~ajisakaa]!

I have been trying to fix or reproduce this on apache/trunk without any success.

In the code I have found 4 places where 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics#forContainer(org.apache.hadoop.yarn.api.records.ContainerId,
 long, long) is referenced, and all of those are invoked only if the 
yarn.nodemanager.container-metrics.enable config is set to true.

Steps taken to reproduce:
 * Run a pseudo distributed hadoop cluster
 * {{run hadoop jar 
hadoop-3.2.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar
 pi 10 10}} to fire up some containers.

If yarn.nodemanager.container-metrics.enable config is set to false I have seen 
no signs of ContainerMetrics being registered whatsoever.

Additionally I have set an output file for 
{{nodemanager.sink.file.filename=nodemanager-metrics.out}} and greping any sign 
in there for ContainerMetrics with 
 {{tail -f nodemanager-metrics.out | grep -i ContainerMetrics}} 
 still without any luck.

Could you provide a way to reproduce this issue?


was (Author: zsiegl):
Hey [~ajisakaa]!

I have been trying to fix or reproduce this on apache/trunk without any success.

In the code I have found 4 places where 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics#forContainer(org.apache.hadoop.yarn.api.records.ContainerId,
 long, long) is referenced, and all of those are invoked only if the 
yarn.nodemanager.container-metrics.enable config is set to true.

Steps taken to reproduce:
 * Run a pseudo distributed hadoop cluster
 * {{run hadoop jar 
hadoop-3.2.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar
 pi 10 10}} to fire up some containers.

If yarn.nodemanager.container-metrics.enable config is set to false I have seen 
no signs of ContainerMetrics being registered whatsoever.

Could you provide a way to reproduce this issue?

> ContainerMetrics is registered even if 
> yarn.nodemanager.container-metrics.enable is set to false
> 
>
> Key: YARN-7680
> URL: https://issues.apache.org/jira/browse/YARN-7680
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 3.0.0
>Reporter: Akira Ajisaka
>Assignee: Zoltan Siegl
>Priority: Critical
>
> ContainerMetrics is unintentionally registered to DefaultMetricsSystem even 
> if yarn.nodemanager.container-metrics.enable is set to false. For example, 
> when we set 
> *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31 to 
> sink all the metrics to Ganglia, MetricsSystem sink ContainerMetrics to 
> ganglia server (localhost:8649 by default).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7773) YARN Federation used Mysql as state store throw exception, Unknown column 'homeSubCluster' in 'field list'

2018-08-23 Thread Y. SREENIVASULU REDDY (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590254#comment-16590254
 ] 

Y. SREENIVASULU REDDY commented on YARN-7773:
-

I found problem in the branch-3.1. good to backport to this branch.

> YARN Federation used Mysql as state store throw exception, Unknown column 
> 'homeSubCluster' in 'field list'
> --
>
> Key: YARN-7773
> URL: https://issues.apache.org/jira/browse/YARN-7773
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 2.9.0, 3.0.0-alpha1, 3.0.0-alpha2, 3.0.0-beta1, 
> 3.0.0-alpha4, 3.0.0-alpha3, 3.0.0
> Environment: Hadoop 3.0.0
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Blocker
>  Labels: patch
> Fix For: 3.2.0
>
> Attachments: YARN-7773.001.patch
>
>
> An error occurred when YARN Federation used Mysql as state store. The reason 
> I found it was because the field used to create the 
> applicationsHomeSubCluster table was 'subClusterId' and the stored procedure 
> used 'homeSubCluster'. I fixed this problem.
>  
> submitApplication appIdapplication_1516277664083_0014 try #0 on SubCluster 
> cluster1 , queue: root.bdp_federation
>  [2018-01-18T23:25:29.325+08:00] [ERROR] 
> store.impl.SQLFederationStateStore.logAndThrowRetriableException(FederationStateStoreUtils.java
>  158) [IPC Server handler 44 on 8050] : Unable to insert the newly generated 
> application application_1516277664083_0014
>  com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown column 
> 'homeSubCluster' in 'field list'
>  at sun.reflect.GeneratedConstructorAccessor15.newInstance(Unknown Source)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
>  at com.mysql.jdbc.Util.getInstance(Util.java:408)
>  at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:944)
>  at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
>  at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
>  at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
>  at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
>  at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
>  at 
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
>  at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079)
>  at 
> com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013)
>  at 
> com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104)
>  at 
> com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418)
>  at com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887)
>  at 
> com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
>  at 
> com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java)
>  at 
> org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>  at com.sun.proxy.$Proxy31.addApplicationHomeSubCluster(Unknown Source)
>  at 
> org.apache.hadoop.yarn.server.federation.utils.FederationStateStoreFacade.addApplicationHomeSubCluster(FederationStateStoreFacade.java:345)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.JDFederationClientInterceptor.submitApplication(JDFederationClientInterceptor.java:334)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.submitApplication(RouterClientRMService.java:196)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:218)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:419)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969

[jira] [Commented] (YARN-7680) ContainerMetrics is registered even if yarn.nodemanager.container-metrics.enable is set to false

2018-08-23 Thread Zoltan Siegl (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590235#comment-16590235
 ] 

Zoltan Siegl commented on YARN-7680:


Hey [~ajisakaa]!

I have been trying to fix or reproduce this on apache/trunk without any success.

In the code I have found 4 places where 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics#forContainer(org.apache.hadoop.yarn.api.records.ContainerId,
 long, long) is referenced, and all of those are invoked only if the 
yarn.nodemanager.container-metrics.enable config is set to true.

Steps taken to reproduce:
 * Run a pseudo distributed hadoop cluster
 * {{run hadoop jar 
hadoop-3.2.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar
 pi 10 10}} to fire up some containers.

If yarn.nodemanager.container-metrics.enable config is set to false I have seen 
no signs of ContainerMetrics being registered whatsoever.

Could you provide a way to reproduce this issue?

> ContainerMetrics is registered even if 
> yarn.nodemanager.container-metrics.enable is set to false
> 
>
> Key: YARN-7680
> URL: https://issues.apache.org/jira/browse/YARN-7680
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 3.0.0
>Reporter: Akira Ajisaka
>Assignee: Zoltan Siegl
>Priority: Critical
>
> ContainerMetrics is unintentionally registered to DefaultMetricsSystem even 
> if yarn.nodemanager.container-metrics.enable is set to false. For example, 
> when we set 
> *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31 to 
> sink all the metrics to Ganglia, MetricsSystem sink ContainerMetrics to 
> ganglia server (localhost:8649 by default).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8702) TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers() failing randomly

2018-08-23 Thread Rakesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh Shah updated YARN-8702:
--
Description: 
this ut fails because of the container status not getting correctly
{quote}
h3. Error Message

expected:<2> but was:<0>
h3. Stacktrace

java.lang.AssertionError: expected:<2> but was:<0> at 
org.junit.Assert.fail(Assert.java:88) at 
org.junit.Assert.failNotEquals(Assert.java:743) at 
org.junit.Assert.assertEquals(Assert.java:118) 
{quote}

  was:this ut fails because of the container status not getting correctly


> TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers() 
> failing randomly
> 
>
> Key: YARN-8702
> URL: https://issues.apache.org/jira/browse/YARN-8702
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: container-queuing
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Assignee: Rakesh Shah
>Priority: Major
> Fix For: 3.1.1
>
>
> this ut fails because of the container status not getting correctly
> {quote}
> h3. Error Message
> expected:<2> but was:<0>
> h3. Stacktrace
> java.lang.AssertionError: expected:<2> but was:<0> at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.failNotEquals(Assert.java:743) at 
> org.junit.Assert.assertEquals(Assert.java:118) 
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8702) TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers() failing randomly

2018-08-23 Thread Rakesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh Shah reassigned YARN-8702:
-

Assignee: Rakesh Shah

> TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers() 
> failing randomly
> 
>
> Key: YARN-8702
> URL: https://issues.apache.org/jira/browse/YARN-8702
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: container-queuing
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Assignee: Rakesh Shah
>Priority: Major
> Fix For: 3.1.1
>
>
> this ut fails because of the container status not getting correctly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8702) TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers() failing randomly

2018-08-23 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590168#comment-16590168
 ] 

Bibin A Chundatt commented on YARN-8702:


[~Rakesh_Shah] 

Thank you for raising the issue .. Added you to contributer list

> TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers() 
> failing randomly
> 
>
> Key: YARN-8702
> URL: https://issues.apache.org/jira/browse/YARN-8702
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: container-queuing
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Priority: Major
> Fix For: 3.1.1
>
>
> this ut fails because of the container status not getting correctly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8702) TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers() failing randomly

2018-08-23 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8702:
---
Affects Version/s: (was: 3.1.0)
   (was: 2.8.3)

> TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers() 
> failing randomly
> 
>
> Key: YARN-8702
> URL: https://issues.apache.org/jira/browse/YARN-8702
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: container-queuing
>Affects Versions: 3.1.1
>Reporter: Rakesh Shah
>Priority: Major
> Fix For: 3.1.1
>
>
> this ut fails because of the container status not getting correctly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8702) TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers() failing randomly

2018-08-23 Thread Rakesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590161#comment-16590161
 ] 

Rakesh Shah commented on YARN-8702:
---

I would like to contribute,

could someone assign me for it.

> TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers() 
> failing randomly
> 
>
> Key: YARN-8702
> URL: https://issues.apache.org/jira/browse/YARN-8702
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: container-queuing
>Affects Versions: 2.8.3, 3.1.0, 3.1.1
>Reporter: Rakesh Shah
>Priority: Major
> Fix For: 3.1.1
>
>
> this ut fails because of the container status not getting correctly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service

2018-08-23 Thread Shane Kumpf (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf reopened YARN-8675:
---

> Setting hostname of docker container breaks with "host" networking mode for 
> Apps which do not run as a YARN service
> ---
>
> Key: YARN-8675
> URL: https://issues.apache.org/jira/browse/YARN-8675
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Major
>  Labels: Docker
>
> Applications like the Spark AM currently do not run as a YARN service and 
> setting hostname breaks driver/executor communication if docker version 
> >=1.13.1 , especially with wire-encryption turned on.
> YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could 
> have a mix of YARN service/native Applications.
> The proposal is to not set the hostname when "host" networking mode is 
> enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8675) Setting hostname of docker container breaks with "host" networking mode for Apps which do not run as a YARN service

2018-08-23 Thread Shane Kumpf (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590143#comment-16590143
 ] 

Shane Kumpf commented on YARN-8675:
---

Looking back over the history here, I think we made the wrong decision in 
setting the YARN defined hostname when {{\-\-net=host}}. I think we should make 
{{\-\-net=host}} return the NM hostname, even if Registry DNS is enabled, as 
you originally proposed in YARN-7797 via your early patches, [~eyang]. While 
using the YARN defined hostname was nice for testing, it breaks several aspects 
of running both Services and "native" Hadoop frameworks, such as MR and Spark, 
side by side, which is a core goal of the containerization effort. 

The problem isn't domain, it is that the "ctr" hostname we are setting won't 
exist in DNS for these containers. Concretely, the NM will set 
{{\-\-hostname=ctr-e111-111-11-01-06.domain.site}} even though 
that entry will never be available via DNS, since the Spark job is not running 
as a YARN Service, and not writing any entries to ZK. Anything related to that 
container that relies on DNS lookups will fail.

> Setting hostname of docker container breaks with "host" networking mode for 
> Apps which do not run as a YARN service
> ---
>
> Key: YARN-8675
> URL: https://issues.apache.org/jira/browse/YARN-8675
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Major
>  Labels: Docker
>
> Applications like the Spark AM currently do not run as a YARN service and 
> setting hostname breaks driver/executor communication if docker version 
> >=1.13.1 , especially with wire-encryption turned on.
> YARN-8027 sets the hostname if YARN DNS is enabled. But the cluster could 
> have a mix of YARN service/native Applications.
> The proposal is to not set the hostname when "host" networking mode is 
> enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8702) TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers() failing randomly

2018-08-23 Thread Rakesh Shah (JIRA)
Rakesh Shah created YARN-8702:
-

 Summary: 
TestContainerSchedulerQueuing.testKillOnlyRequiredOpportunisticContainers() 
failing randomly
 Key: YARN-8702
 URL: https://issues.apache.org/jira/browse/YARN-8702
 Project: Hadoop YARN
  Issue Type: Bug
  Components: container-queuing
Affects Versions: 3.1.1, 3.1.0, 2.8.3
Reporter: Rakesh Shah
 Fix For: 3.1.1


this ut fails because of the container status not getting correctly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-23 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590028#comment-16590028
 ] 

genericqa commented on YARN-8632:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 42s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 20s{color} 
| {color:red} hadoop-sls in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 65m 22s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.sls.TestSLSRunner |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8632 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12936796/YARN-8632.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e40660a5fc15 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1ac0144 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/21665/artifact/out/patch-unit-hadoop-tools_hadoop-sls.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21665/testReport/ |
| Max. process+thread count | 449 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls |
| Console output | 
https://builds.apache.org/job/PreCommit-Y

[jira] [Comment Edited] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-23 Thread Xianghao Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589918#comment-16589918
 ] 

Xianghao Lu edited comment on YARN-8632 at 8/23/18 8:49 AM:


It seems that "setUncaughtExceptionHandler" is not suitable in thread pool, 
refer to 
https://issues.apache.org/jira/browse/HADOOP-12748
[http://literatejava.com/threading/silent-thread-death-unhandled-exceptions/|http://literatejava.com/threading/silent-thread-death-unhandled-exceptions/]

Using try catch block to catch Exception may not be a good method but it does 
solve the problem in this jira, but I also find it is used in
[AppLevelAggregator|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/AppLevelTimelineCollectorWithAgg.java#L139]
[EntityLogScanner|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/src/main/java/org/apache/hadoop/yarn/server/timeline/EntityGroupFSTimelineStore.java#L865],
 etc. which are related to function scheduleAtFixedRate.

Fortunately, I noticed HADOOP-12749 HADOOP-14966 provide a nice way to log 
uncaught exception in thread pool, so the patch of branch brunk will be very 
easy.
Attatched YARN-8632.003.patch


was (Author: luxianghao):
It seems that "setUncaughtExceptionHandler" is not suitable in thread pool, 
refer to 
https://issues.apache.org/jira/browse/HADOOP-12748
[http://literatejava.com/threading/silent-thread-death-unhandled-exceptions/|http://example.com]

Using try catch block to catch Exception may not be a good method but it does 
solve the problem in this jira, but I also find it is used in
[AppLevelAggregator|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/AppLevelTimelineCollectorWithAgg.java#L139]
[EntityLogScanner|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/src/main/java/org/apache/hadoop/yarn/server/timeline/EntityGroupFSTimelineStore.java#L865],
 etc. which are related to function scheduleAtFixedRate.

Fortunately, I noticed HADOOP-12749 HADOOP-14966 provide a nice way to log 
uncaught exception in thread pool, so the patch of branch brunk will be very 
easy.
Attatched YARN-8632.003.patch

> No data in file realtimetrack.json after running SchedulerLoadSimulator
> ---
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Assignee: Xianghao Lu
>Priority: Major
> Attachments: YARN-8632-branch-2.7.2.001.patch, YARN-8632.001.patch, 
> YARN-8632.002.patch, YARN-8632.003.patch
>
>
> Recently, I have been using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
> second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-23 Thread Xianghao Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589918#comment-16589918
 ] 

Xianghao Lu commented on YARN-8632:
---

It seems that "setUncaughtExceptionHandler" is not suitable in thread pool, 
refer to 
https://issues.apache.org/jira/browse/HADOOP-12748
[http://literatejava.com/threading/silent-thread-death-unhandled-exceptions/|http://example.com]

Using try catch block to catch Exception may not be a good method but it does 
solve the problem in this jira, but I also find it is used in
[AppLevelAggregator|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/AppLevelTimelineCollectorWithAgg.java#L139]
[EntityLogScanner|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage/src/main/java/org/apache/hadoop/yarn/server/timeline/EntityGroupFSTimelineStore.java#L865],
 etc. which are related to function scheduleAtFixedRate.

Fortunately, I noticed HADOOP-12749 HADOOP-14966 provide a nice way to log 
uncaught exception in thread pool, so the patch of branch brunk will be very 
easy.
Attatched YARN-8632.003.patch

> No data in file realtimetrack.json after running SchedulerLoadSimulator
> ---
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Assignee: Xianghao Lu
>Priority: Major
> Attachments: YARN-8632-branch-2.7.2.001.patch, YARN-8632.001.patch, 
> YARN-8632.002.patch, YARN-8632.003.patch
>
>
> Recently, I have been using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
> second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8691) AMRMClient unregisterApplicationMaster Api's appMessage should have a maximum size

2018-08-23 Thread Yicong Cai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yicong Cai resolved YARN-8691.
--
  Resolution: Duplicate
   Fix Version/s: (was: 2.7.7)
  3.0.0-alpha4
Target Version/s:   (was: 2.7.7)

> AMRMClient unregisterApplicationMaster Api's appMessage should have a maximum 
> size
> --
>
> Key: YARN-8691
> URL: https://issues.apache.org/jira/browse/YARN-8691
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Critical
> Fix For: 3.0.0-alpha4
>
>
> SparkSQL AM Codegen ERROR,then call unregister AM API and send the error 
> message to RM, RM receive the AM state and update to RMStateStore. The  
> Codegen error message maybe is huge, (Our case is about 200MB). If the 
> RMStateStore is ZKRMStateStore, it causes the same exception as YARN-6125, 
> but YARN-6125 doesn't cover the unregisterApplicationMaster's message cut.
>  
> SparkSQL Codegen error message show below:
> 18/08/18 08:34:54 ERROR codegen.CodeGenerator: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Constant pool has grown past JVM 
> limit of 0x
>  /* 001 */ public java.lang.Object generate(Object[] references)
> { /* 002 */ return new SpecificSafeProjection(references); /* 003 */ }
> /* 004 */
>  /* 005 */ class SpecificSafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection {
>  ..
> about 2 million lines.
> ..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8701) If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1

2018-08-23 Thread Sen Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589909#comment-16589909
 ] 

Sen Zhao commented on YARN-8701:


Hi, [~leftnoteasy],[~snemeth], could you give me some advice about this?

> If the single parameter in Resources#createResourceWithSameValue is greater 
> than Integer.MAX_VALUE, then the value of vcores will be -1
> ---
>
> Key: YARN-8701
> URL: https://issues.apache.org/jira/browse/YARN-8701
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Reporter: Sen Zhao
>Assignee: Sen Zhao
>Priority: Major
> Attachments: YARN-8701.001.patch
>
>
> If I configure *MaxResources* in fair-scheduler.xml, like this:
> {code}resource1=50{code}
> In the queue, the *MaxResources* value will change to 
> {code}Max Resources: {code}
> I think the value of VCores should be *CLUSTER_VCORES*.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-23 Thread Xianghao Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianghao Lu updated YARN-8632:
--
Attachment: YARN-8632.003.patch

> No data in file realtimetrack.json after running SchedulerLoadSimulator
> ---
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Assignee: Xianghao Lu
>Priority: Major
> Attachments: YARN-8632-branch-2.7.2.001.patch, YARN-8632.001.patch, 
> YARN-8632.002.patch, YARN-8632.003.patch
>
>
> Recently, I have been using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
> second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org