[jira] [Commented] (YARN-8756) [Submarine] Properly handle relative path for staging area

2018-09-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608736#comment-16608736
 ] 

Hadoop QA commented on YARN-8756:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 37s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
33s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine 
in trunk has 4 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
30s{color} | {color:green} hadoop-yarn-submarine in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 53m  8s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8756 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12939006/YARN-8756.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 49ccf03c06da 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / eef3baf |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/21793/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-submarine-warnings.html
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21793/testReport/ |
| Max. process+thread count | 334 (vs. ulimit of 1) |
| modules | C: 

[jira] [Commented] (YARN-8698) [Submarine] Failed to reset Hadoop home environment when submitting a submarine job

2018-09-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608729#comment-16608729
 ] 

Hudson commented on YARN-8698:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14910 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14910/])
YARN-8698. [Submarine] Failed to reset Hadoop home environment when (wangda: 
rev de85f841dae2bb22696472d78feb852883761a4e)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine/src/main/java/org/apache/hadoop/yarn/submarine/runtimes/yarnservice/YarnServiceJobSubmitter.java


> [Submarine] Failed to reset Hadoop home environment when submitting a 
> submarine job
> ---
>
> Key: YARN-8698
> URL: https://issues.apache.org/jira/browse/YARN-8698
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Assignee: Zac Zhou
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8698.001.patch
>
>
> When a standalone submarine tf job is submitted, the following error is got :
> INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11)
>  INFO:tensorflow:Done calling model_fn.
>  INFO:tensorflow:Create CheckpointSaverHook.
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  
> This error may be related to hadoop classpath
> Hadoop env variables of launch_container.sh are as follows:
> export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"}
>  export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"}
>  
> run-PRIMARY_WORKER.sh is like:
> export HADOOP_YARN_HOME=
>  export HADOOP_HDFS_HOME=/hadoop-3.1.0
>  export HADOOP_CONF_DIR=$WORK_DIR
>  
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8657) User limit calculation should be read-lock-protected within LeafQueue

2018-09-09 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608717#comment-16608717
 ] 

Sunil Govindan commented on YARN-8657:
--

[~leftnoteasy] Patch looks good to me. Will commit tomorrow if no objections.

cc [~cheersyang]

> User limit calculation should be read-lock-protected within LeafQueue
> -
>
> Key: YARN-8657
> URL: https://issues.apache.org/jira/browse/YARN-8657
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8657.001.patch
>
>
> When async scheduling is enabled, user limit calculation could be wrong: 
> It is possible that scheduler calculated a user_limit, but inside 
> {{canAssignToUser}} it becomes staled. 
> We need to protect user limit calculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8340) Capacity Scheduler Intra Queue Preemption Should Work When 3rd or more resources enabled.

2018-09-09 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608714#comment-16608714
 ] 

Sunil Govindan commented on YARN-8340:
--

Ping [~leftnoteasy] [~Zian Chen] [~eepayne]. Could we move to 3.3 if this is 
not critical for 3.2

> Capacity Scheduler Intra Queue Preemption Should Work When 3rd or more 
> resources enabled.
> -
>
> Key: YARN-8340
> URL: https://issues.apache.org/jira/browse/YARN-8340
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Critical
>
> Refer to comment from [~eepayne] and discussion below that: 
> https://issues.apache.org/jira/browse/YARN-8292?focusedCommentId=16482689=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16482689
>  for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7505) RM REST endpoints generate malformed JSON

2018-09-09 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608711#comment-16608711
 ] 

Sunil Govindan commented on YARN-7505:
--

[~templedf] could u pls help to check this as we are nearing 3.2.0

> RM REST endpoints generate malformed JSON
> -
>
> Key: YARN-7505
> URL: https://issues.apache.org/jira/browse/YARN-7505
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: restapi
>Affects Versions: 3.0.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-7505.001.patch, YARN-7505.002.patch
>
>
> For all endpoints that return DAOs that contain maps, the generated JSON is 
> malformed.  For example:
> % curl 'http://localhost:8088/ws/v1/cluster/apps'
> {"apps":{"app":[{"id":"application_1510777276702_0001","user":"daniel","name":"QuasiMonteCarlo","queue":"root.daniel","state":"RUNNING","finalStatus":"UNDEFINED","progress":5.0,"trackingUI":"ApplicationMaster","trackingUrl":"http://dhcp-10-16-0-181.pa.cloudera.com:8088/proxy/application_1510777276702_0001/","diagnostics":"","clusterId":1510777276702,"applicationType":"MAPREDUCE","applicationTags":"","priority":0,"startedTime":1510777317853,"finishedTime":0,"elapsedTime":21623,"amContainerLogs":"http://dhcp-10-16-0-181.pa.cloudera.com:8042/node/containerlogs/container_1510777276702_0001_01_01/daniel","amHostHttpAddress":"dhcp-10-16-0-181.pa.cloudera.com:8042","amRPCAddress":"dhcp-10-16-0-181.pa.cloudera.com:63371","allocatedMB":5120,"allocatedVCores":4,"reservedMB":0,"reservedVCores":0,"runningContainers":4,"memorySeconds":49820,"vcoreSeconds":26,"queueUsagePercentage":62.5,"clusterUsagePercentage":62.5,"resourceSecondsMap":{"entry":{"key":"test2","value":"0"},"entry":{"key":"test","value":"0"},"entry":{"key":"memory-mb","value":"49820"},"entry":{"key":"vcores","value":"26"}},"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0,"preemptedMemorySeconds":0,"preemptedVcoreSeconds":0,"preemptedResourceSecondsMap":{},"resourceRequests":[{"priority":20,"resourceName":"dhcp-10-16-0-181.pa.cloudera.com","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false},{"priority":20,"resourceName":"/default-rack","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false},{"priority":20,"resourceName":"*","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false}],"logAggregationStatus":"DISABLED","unmanagedApplication":false,"amNodeLabelExpression":"","timeouts":{"timeout":[{"type":"LIFETIME","expiryTime":"UNLIMITED","remainingTimeInSeconds":-1}]}}]}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified

2018-09-09 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608611#comment-16608611
 ] 

Wangda Tan commented on YARN-8757:
--

Working on the patch now, will update patch shortly.

> [Submarine] Add Tensorboard component when --tensorboard is specified
> -
>
> Key: YARN-8757
> URL: https://issues.apache.org/jira/browse/YARN-8757
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>
> We need to have a Tensorboard component when --tensorboard is specified. And 
> we need to set quicklinks to let users view tensorboard.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified

2018-09-09 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8757:
-
Description: We need to have a Tensorboard component when --tensorboard is 
specified. And we need to set quicklinks to let users view tensorboard.

> [Submarine] Add Tensorboard component when --tensorboard is specified
> -
>
> Key: YARN-8757
> URL: https://issues.apache.org/jira/browse/YARN-8757
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>
> We need to have a Tensorboard component when --tensorboard is specified. And 
> we need to set quicklinks to let users view tensorboard.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified

2018-09-09 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-8757:


 Summary: [Submarine] Add Tensorboard component when --tensorboard 
is specified
 Key: YARN-8757
 URL: https://issues.apache.org/jira/browse/YARN-8757
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8698) [Submarine] Failed to reset Hadoop home environment when submitting a submarine job

2018-09-09 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8698:
-
Summary: [Submarine] Failed to reset Hadoop home environment when 
submitting a submarine job  (was: [Submarine] Failed to add hadoop dependencies 
in docker container when submitting a submarine job)

> [Submarine] Failed to reset Hadoop home environment when submitting a 
> submarine job
> ---
>
> Key: YARN-8698
> URL: https://issues.apache.org/jira/browse/YARN-8698
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Assignee: Zac Zhou
>Priority: Major
> Attachments: YARN-8698.001.patch
>
>
> When a standalone submarine tf job is submitted, the following error is got :
> INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11)
>  INFO:tensorflow:Done calling model_fn.
>  INFO:tensorflow:Create CheckpointSaverHook.
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  
> This error may be related to hadoop classpath
> Hadoop env variables of launch_container.sh are as follows:
> export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"}
>  export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"}
>  
> run-PRIMARY_WORKER.sh is like:
> export HADOOP_YARN_HOME=
>  export HADOOP_HDFS_HOME=/hadoop-3.1.0
>  export HADOOP_CONF_DIR=$WORK_DIR
>  
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8756) [Submarine] Properly handle relative path for staging area

2018-09-09 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8756:
-
Attachment: YARN-8756.001.patch

> [Submarine] Properly handle relative path for staging area
> --
>
> Key: YARN-8756
> URL: https://issues.apache.org/jira/browse/YARN-8756
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: submarine
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8756.001.patch
>
>
> While doing tests, I found when a relative path is being specified for 
> checkpoint. The path passed to Tensorflow is wrong. A trick is get a 
> FileStatus before return. Will attach fix soon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8756) [Submarine] Properly handle relative path for staging area

2018-09-09 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-8756:


 Summary: [Submarine] Properly handle relative path for staging area
 Key: YARN-8756
 URL: https://issues.apache.org/jira/browse/YARN-8756
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: submarine
Reporter: Wangda Tan
Assignee: Wangda Tan


While doing tests, I found when a relative path is being specified for 
checkpoint. The path passed to Tensorflow is wrong. A trick is get a FileStatus 
before return. Will attach fix soon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8698) [Submarine] Failed to add hadoop dependencies in docker container when submitting a submarine job

2018-09-09 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608545#comment-16608545
 ] 

Wangda Tan commented on YARN-8698:
--

 +1, will commit the patch shortly.

Thanks,

> [Submarine] Failed to add hadoop dependencies in docker container when 
> submitting a submarine job
> -
>
> Key: YARN-8698
> URL: https://issues.apache.org/jira/browse/YARN-8698
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Assignee: Zac Zhou
>Priority: Major
> Attachments: YARN-8698.001.patch
>
>
> When a standalone submarine tf job is submitted, the following error is got :
> INFO:tensorflow:image after unit resnet/tower_0/fully_connected/: (?, 11)
>  INFO:tensorflow:Done calling model_fn.
>  INFO:tensorflow:Create CheckpointSaverHook.
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  hdfsBuilderConnect(forceNewInstance=0, nn=submarine, port=0, 
> kerbTicketCachePath=(NULL), userNa
>  me=(NULL)) error:
>  (unable to get root cause for java.lang.NoClassDefFoundError)
>  (unable to get stack trace for java.lang.NoClassDefFoundError)
>  
> This error may be related to hadoop classpath
> Hadoop env variables of launch_container.sh are as follows:
> export HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/home/hadoop/yarn-submarine/conf"}
>  export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/home/hadoop/yarn-submarine"}
>  export HADOOP_HOME=${HADOOP_HOME:-"/home/hadoop/yarn-submarine"}
>  
> run-PRIMARY_WORKER.sh is like:
> export HADOOP_YARN_HOME=
>  export HADOOP_HDFS_HOME=/hadoop-3.1.0
>  export HADOOP_CONF_DIR=$WORK_DIR
>  
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5592) Add support for dynamic resource updates with multiple resource types

2018-09-09 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608423#comment-16608423
 ] 

Manikandan R commented on YARN-5592:


[~sunilg] [~leftnoteasy] Thanks your comments.

{quote}What is the problem we wanna to solve here?{quote}

We want to allow admin to add new resource types, update and delete existing 
resource types in RM dynamically at run time. For example, Lets say RM has 
resources like R1 (Unit is M), R2 (Unit is G), R3. We would like to provide an 
option to add R4 or modify unit of R1 or remove R3. Once this has been done at 
RM side, plan is to implement the same at NM side eventually to make dynamic 
resource types as complete feature.

{quote}It makes more sense to me to restart RMs to take effect of resource 
related changes.{quote}

Is this acceptable for YARN without HA?

> Add support for dynamic resource updates with multiple resource types
> -
>
> Key: YARN-5592
> URL: https://issues.apache.org/jira/browse/YARN-5592
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-5592-design-2.docx
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org