[jira] [Commented] (YARN-6421) Build failure in Yarn-UI on ppc64le

2017-04-07 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961698#comment-15961698
 ] 

Sunil G commented on YARN-6421:
---

Kicking jenkins again as HADOOP-14285 is committed.

> Build failure in Yarn-UI on ppc64le
> ---
>
> Key: YARN-6421
> URL: https://issues.apache.org/jira/browse/YARN-6421
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.0.0-alpha3
> Environment: Ubuntu 14.04 
> ppc64le
> $ java -version
> openjdk version "1.8.0_111"
> OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14)
> OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
>Reporter: Sonia Garudi
>  Labels: ppc64le
> Attachments: YARN-6421.patch
>
>
> The build fails with the following message :
> {code}
> [ERROR] 
> /ws/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/src/main/webapp/node/node:
>  1: 
> /ws/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/src/main/webapp/node/node:ELF:
>  not found
> [ERROR] 
> /ws/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/src/main/webapp/node/node:
>  21: 
> /ws/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/target/src/main/webapp/node/node:
>  Syntax error: ")" unexpected
> {code}
> The error is due to frontend-maven plugin version 0.0.22, which does not have 
> support for ppc64le. It downloads the x86 version of node. 
> ppc64le support was added to frontend-maven-plugin in version 1.1.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6456) Isolation of Docker containers In LinuxContainerExecutor

2017-04-07 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6456:


 Summary: Isolation of Docker containers In LinuxContainerExecutor
 Key: YARN-6456
 URL: https://issues.apache.org/jira/browse/YARN-6456
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Miklos Szegedi


One reason to use Docker containers is to be able to isolate different 
workloads, even, if they run as the same user.
I have noticed some issues in the current design:
1. DockerLinuxContainerRuntime mounts containerLocalDirs 
{{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see and 
modify the files of another container. I think the application file cache 
directory should be enough for the container to run in most of the cases.
2. The whole cgroups directory is mounted. Would the container directory be 
enough?
3. There is no way to enforce exclusive use of Docker for all containers. There 
should be an option that it is not the user but the admin that requires to use 
Docker.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6298) Metric preemptCall is not used in new preemption

2017-04-07 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961613#comment-15961613
 ] 

Yufei Gu commented on YARN-6298:


Thanks [~kasha] for the review and commit.

> Metric preemptCall is not used in new preemption
> 
>
> Key: YARN-6298
> URL: https://issues.apache.org/jira/browse/YARN-6298
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>Priority: Blocker
>  Labels: newbie
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6298.001.patch
>
>
> Either get rid of it in Hadoop 3 or use it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6298) Metric preemptCall is not used in new preemption

2017-04-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961609#comment-15961609
 ] 

Hudson commented on YARN-6298:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11553 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11553/])
YARN-6298. Metric preemptCall is not used in new preemption. (Yufei Gu (kasha: 
rev 2aa8967809b05e558b36ab6fc8c5110ddb723b45)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSOpDurations.java


> Metric preemptCall is not used in new preemption
> 
>
> Key: YARN-6298
> URL: https://issues.apache.org/jira/browse/YARN-6298
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>Priority: Blocker
>  Labels: newbie
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6298.001.patch
>
>
> Either get rid of it in Hadoop 3 or use it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6298) Metric preemptCall is not used in new preemption

2017-04-07 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-6298:
---
Summary: Metric preemptCall is not used in new preemption  (was: Metric 
preemptCall is not used in new preemption.)

> Metric preemptCall is not used in new preemption
> 
>
> Key: YARN-6298
> URL: https://issues.apache.org/jira/browse/YARN-6298
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>Priority: Blocker
>  Labels: newbie
> Attachments: YARN-6298.001.patch
>
>
> Either get rid of it in Hadoop 3 or use it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5531) UnmanagedAM pool manager for federating application across clusters

2017-04-07 Thread Botong Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-5531:
---
Attachment: YARN-5531-YARN-2915.v4.patch

UAM pool manager added before UAM

> UnmanagedAM pool manager for federating application across clusters
> ---
>
> Key: YARN-5531
> URL: https://issues.apache.org/jira/browse/YARN-5531
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Botong Huang
> Attachments: YARN-5531-YARN-2915.v1.patch, 
> YARN-5531-YARN-2915.v2.patch, YARN-5531-YARN-2915.v3.patch, 
> YARN-5531-YARN-2915.v4.patch
>
>
> One of the main tenets the YARN Federation is to *transparently* scale 
> applications across multiple clusters. This is achieved by running UAMs on 
> behalf of the application on other clusters. This JIRA tracks the addition of 
> a UnmanagedAM pool manager for federating application across clusters which 
> will be used the FederationInterceptor (YARN-3666) which is part of the 
> AMRMProxy pipeline introduced in YARN-2884.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6298) Metric preemptCall is not used in new preemption.

2017-04-07 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961541#comment-15961541
 ] 

Karthik Kambatla commented on YARN-6298:


Unfortunately, the patch doesn't apply cleanly anymore. [~yufeigu] - mind 
updating it? 

> Metric preemptCall is not used in new preemption.
> -
>
> Key: YARN-6298
> URL: https://issues.apache.org/jira/browse/YARN-6298
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>Priority: Blocker
>  Labels: newbie
> Attachments: YARN-6298.001.patch
>
>
> Either get rid of it in Hadoop 3 or use it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3742) YARN RM will shut down if ZKClient creation times out

2017-04-07 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961537#comment-15961537
 ] 

Karthik Kambatla commented on YARN-3742:


I like the clarity in the log messages. However, the cases and if-else blocks 
can be improved by setting the message at the source of the event. Also, the 
default case doesn't check for HA.

> YARN RM  will shut down if ZKClient creation times out 
> ---
>
> Key: YARN-3742
> URL: https://issues.apache.org/jira/browse/YARN-3742
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Daniel Templeton
> Attachments: YARN-3742.001.patch
>
>
> The RM goes down showing the following stacktrace if the ZK client connection 
> fails to be created. We should not exit but transition to StandBy and stop 
> doing things and let the other RM take over.
> {code}
> 2015-04-19 01:22:20,513  FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause:
> java.io.IOException: Wait for ZKClient creation timed out
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1066)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1090)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:996)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationStateInternal(ZKRMStateStore.java:643)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:162)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:147)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:806)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:879)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:874)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6298) Metric preemptCall is not used in new preemption.

2017-04-07 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961527#comment-15961527
 ] 

Karthik Kambatla commented on YARN-6298:


+1. Checking this in. 

> Metric preemptCall is not used in new preemption.
> -
>
> Key: YARN-6298
> URL: https://issues.apache.org/jira/browse/YARN-6298
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>Priority: Blocker
>  Labels: newbie
> Attachments: YARN-6298.001.patch
>
>
> Either get rid of it in Hadoop 3 or use it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6410) FSContext.scheduler should be final

2017-04-07 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton reassigned YARN-6410:
--

Assignee: Daniel Templeton

> FSContext.scheduler should be final
> ---
>
> Key: YARN-6410
> URL: https://issues.apache.org/jira/browse/YARN-6410
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Minor
>  Labels: newbie
>
> {code}
>   private FairScheduler scheduler;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6433) Only accessible cgroup mount directories should be selected for a controller

2017-04-07 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961524#comment-15961524
 ] 

Karthik Kambatla commented on YARN-6433:


Patch looks reasonable. Some minor comments:
# File has a {{canRead()}} method. We don't need to use the FileUtil one. 
# In the test, cpuMtabContentMissing should have "cpu" instead of "cp"?

> Only accessible cgroup mount directories should be selected for a controller
> 
>
> Key: YARN-6433
> URL: https://issues.apache.org/jira/browse/YARN-6433
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha3
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-6433.000.patch
>
>
> I have a Ubuntu16 box that returns the following error with pre-mounted 
> cgroups on the latest trunk:
> {code}
> 2017-04-03 19:42:18,511 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
>  Cgroups not accessible /run/lxcfs/controllers/cpu,cpuacct
> {code}
> The version is:
> {code}
> $ uname -a
> Linux mybox 4.4.0-24-generic #43-Ubuntu SMP Wed Jun 8 19:27:37 UTC 2016 
> x86_64 x86_64 x86_64 GNU/Linux
> {code}
> The following cpu cgroup filesystems are mounted:
> {code}
> $ grep cpuacct /etc/mtab
> cgroup /sys/fs/cgroup/cpu,cpuacct cgroup 
> rw,nosuid,nodev,noexec,relatime,cpu,cpuacct,nsroot=/ 0 0
> cpu,cpuacct /run/lxcfs/controllers/cpu,cpuacct cgroup 
> rw,relatime,cpu,cpuacct,nsroot=/ 0 0
> {code}
> /sys/fs/cgroup is accessible to my yarn user, so it should be selected 
> instead of /run/lxcfs/controllers



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6451) Create a monitor to check whether we maintain RM (scheduling) invariants

2017-04-07 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961521#comment-15961521
 ] 

Carlo Curino commented on YARN-6451:


[~jlowe] I think the inline asserts might be useful particularly for simple 
"range" type checks (like the silly examples I have so far). 
The one concern would be whether the checks done for each state transition of 
the metrics become to expensive. The other issue
is that while certain invariants are universally true, other might be 
deployment specific, and by having them externally loaded/configured
like I do in this example, it is easier to customize them for a specific 
workload. E.g., in some of our clusters apps are self-throttling
and when they ask for containers they should be receiving them very quickly, so 
we would like to establish some invariant on allocation
latency, which cannot be assert generally.

All in all, I would like to foster more invariant checking in our codebase, as 
a way to complement more specific unit tests---this
little patch is a step in that direction.
In particular, given the work done in SLS, I think we can easily have 
integration tests that run large portions of the codebase (e.g., the RM)
simulating a large workload, and check that important invariants (e.g., complex 
one like you mentioned) are respected. 


> Create a monitor to check whether we maintain RM (scheduling) invariants
> 
>
> Key: YARN-6451
> URL: https://issues.apache.org/jira/browse/YARN-6451
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch
>
>
> For SLS runs, as well as for live test clusters (and maybe prod), it would be 
> useful to have a mechanism to continuously check whether core invariants of 
> the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly 
> respected, certain latencies within expected range, etc..)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6368) Decommissioning an NM results in a -1 exit code

2017-04-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961494#comment-15961494
 ] 

Hudson commented on YARN-6368:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11550 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11550/])
YARN-6368. Decommissioning an NM results in a -1 exit code (rkanter: rev 
63f7322522e0ab223ceb91440636eb62ca0a3e41)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java


> Decommissioning an NM results in a -1 exit code
> ---
>
> Key: YARN-6368
> URL: https://issues.apache.org/jira/browse/YARN-6368
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
> Attachments: YARN-6368.000.patch, YARN-6368.001.patch, 
> YARN-6368.002.patch, YARN-6368.003.patch, YARN-6368.004.patch
>
>
> In NodeManager.java we should exit normally in case the RM shuts down the 
> node:
> {code}
> } finally {
>   if (shouldExitOnShutdownEvent
>   && !ShutdownHookManager.get().isShutdownInProgress()) {
> ExitUtil.terminate(-1);
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6164) Expose maximum-am-resource-percent in YarnClient

2017-04-07 Thread Benson Qiu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961450#comment-15961450
 ] 

Benson Qiu commented on YARN-6164:
--

[~sunilg]: Mind taking a look at the latest patch?

> Expose maximum-am-resource-percent in YarnClient
> 
>
> Key: YARN-6164
> URL: https://issues.apache.org/jira/browse/YARN-6164
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.2
>Reporter: Benson Qiu
>Assignee: Benson Qiu
> Attachments: YARN-6164.001.patch, YARN-6164.002.patch, 
> YARN-6164.003.patch, YARN-6164.004.patch, YARN-6164.005.patch, 
> YARN-6164.006.patch, YARN-6164.007.patch, YARN-6164.008.patch
>
>
> `yarn.scheduler.capacity.maximum-am-resource-percent` is exposed through the 
> [Cluster Scheduler 
> API|http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API],
>  but not through 
> [YarnClient|https://hadoop.apache.org/docs/current/api/org/apache/hadoop/yarn/client/api/YarnClient.html].
> Since YarnClient and RM REST APIs depend on different ports (8032 vs 8088 by 
> default), it would be nice to expose `maximum-am-resource-percent` in 
> YarnClient as well. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-04-07 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961235#comment-15961235
 ] 

Eric Payne commented on YARN-2113:
--

Thanks [~sunilg] for the update. I am still reviewing the patch, but I have a 
couple of minor things to say right now:


TestProportionalCapacityPreemptionPolicyIntraQueueUserLimit.java:
- Several comments in this file say "... assume minimum user limit factor is 
...". This should say "minimum user limit percent". This is a small thing, but 
to avoid confusion, I think it's important to make the distinction between 
minimum user limit percent and user limit factor.
{code}
 * Consider 2 users in a queue, assume minimum user limit factor is 50%.
{code}
- In several tests, the the {{root}} queue does not have values that equal the 
sums of the subqueues. For example, in the code below, shouldn't used be 100 
and pending be 50? Also, the comment says {{b}} instead of {{a}} for the 
queuename.
{code}
String queuesConfig =
// guaranteed,max,used,pending,reserved
"root(=[100 100 55 170 0]);" + // root
"-a(=[100 100 100 30 0])"; // b
{code}
- Also, the {{appsConfig}} often doesn't add up to the correct used numbers in 
the {{queuesConfig}}


> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Attachments: 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5797) Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches

2017-04-07 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961163#comment-15961163
 ] 

Chris Trezzo commented on YARN-5797:


Thanks everyone for the reviews and [~mingma] for the commit!

> Add metrics to the node manager for cleaning the PUBLIC and PRIVATE caches
> --
>
> Key: YARN-5797
> URL: https://issues.apache.org/jira/browse/YARN-5797
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.9.0, 3.0.0-alpha3
>
> Attachments: YARN-5797-branch-2.001.patch, YARN-5797-trunk.002.patch, 
> YARN-5797-trunk.003.patch, YARN-5797-trunk-v1.patch
>
>
> Add new metrics to the node manager around the local cache sizes and how much 
> is being cleaned from them on a regular bases. For example, we can expose 
> information contained in the {{LocalCacheCleanerStats}} class.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6368) Decommissioning an NM results in a -1 exit code

2017-04-07 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-6368:
-
Attachment: YARN-6368.004.patch

Adding the final to the pach for branch-2.

> Decommissioning an NM results in a -1 exit code
> ---
>
> Key: YARN-6368
> URL: https://issues.apache.org/jira/browse/YARN-6368
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
> Attachments: YARN-6368.000.patch, YARN-6368.001.patch, 
> YARN-6368.002.patch, YARN-6368.003.patch, YARN-6368.004.patch
>
>
> In NodeManager.java we should exit normally in case the RM shuts down the 
> node:
> {code}
> } finally {
>   if (shouldExitOnShutdownEvent
>   && !ShutdownHookManager.get().isShutdownInProgress()) {
> ExitUtil.terminate(-1);
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6368) Decommissioning an NM results in a -1 exit code

2017-04-07 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961154#comment-15961154
 ] 

Miklos Szegedi edited comment on YARN-6368 at 4/7/17 5:29 PM:
--

Adding the final to the patch for branch-2.


was (Author: miklos.szeg...@cloudera.com):
Adding the final to the pach for branch-2.

> Decommissioning an NM results in a -1 exit code
> ---
>
> Key: YARN-6368
> URL: https://issues.apache.org/jira/browse/YARN-6368
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
> Attachments: YARN-6368.000.patch, YARN-6368.001.patch, 
> YARN-6368.002.patch, YARN-6368.003.patch, YARN-6368.004.patch
>
>
> In NodeManager.java we should exit normally in case the RM shuts down the 
> node:
> {code}
> } finally {
>   if (shouldExitOnShutdownEvent
>   && !ShutdownHookManager.get().isShutdownInProgress()) {
> ExitUtil.terminate(-1);
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2017-04-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961043#comment-15961043
 ] 

Hadoop QA commented on YARN-6091:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 
59s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 29m 53s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6091 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12862495/YARN-6091.001.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux bef3f2f98758 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / ad24464 |
| Default Java | 1.8.0_121 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/15558/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/15558/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.1
> Environment: CentOS
>Reporter: zhengchenyu
>Assignee: Eric Badger
>Priority: Critical
> Attachments: YARN-6091.001.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the 

[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2017-04-07 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961027#comment-15961027
 ] 

Daniel Templeton commented on YARN-2962:


A couple of quick questions:

# Why are you removing the {{rm1.close()}} and {{rm2.close()}} in 
{{TestZKRMStateStore.testFencing()}}?
# Why drop this log line {{LOG.info("Unknown child node with name: " + 
childNodeName);}} in {{loadRMAppState()}}

Also, two super minor quibbles:

{code}
  private static RMApp createMockAppForRemove(ApplicationId appId,
  ApplicationAttemptId...attemptIds) {
{code} should have a space after the ellipses.

In {{yarn-default.xml}}, the first line should have a space before the 
parenthesis.

If you don't mind posting a patch to fix the spaces, that would be great.  If 
not, I'll file a couple newbie JIRAs to fix them post-commit.

> ZKRMStateStore: Limit the number of znodes under a znode
> 
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: YARN-2962.006.patch, YARN-2962.007.patch, 
> YARN-2962.008.patch, YARN-2962.008.patch, YARN-2962.009.patch, 
> YARN-2962.010.patch, YARN-2962.01.patch, YARN-2962.04.patch, 
> YARN-2962.05.patch, YARN-2962.2.patch, YARN-2962.3.patch
>
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6451) Create a monitor to check whether we maintain RM (scheduling) invariants

2017-04-07 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961022#comment-15961022
 ] 

Jason Lowe commented on YARN-6451:
--

Interesting idea.  For some of these invariants, would it make more sense to 
put an assert-like hook in the metric code itself?  I'm thinking why hope that 
a periodic interval happens to catch the metric being negative when we can have 
the metric itself protest when someone tries to set it below zero?  As a bonus, 
we'd have access to the stacktrace that triggered it.

I could see this periodic approach being really useful for more complicated 
expressions like validating stats across users, across queues, etc. where it's 
tricky/expensive to evaluate it on a single metric update.

> Create a monitor to check whether we maintain RM (scheduling) invariants
> 
>
> Key: YARN-6451
> URL: https://issues.apache.org/jira/browse/YARN-6451
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch
>
>
> For SLS runs, as well as for live test clusters (and maybe prod), it would be 
> useful to have a mechanism to continuously check whether core invariants of 
> the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly 
> respected, certain latencies within expected range, etc..)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6277) Nodemanager heap memory leak

2017-04-07 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961013#comment-15961013
 ] 

Haibo Chen edited comment on YARN-6277 at 4/7/17 4:01 PM:
--

[~Feng Yuan] If cached is enabled, there shouldn't be massive LocalFileSystem 
instances, unless I am missing something.  For a given key, 
LocalFileSystem.NAME in this case, there is going to be just one instance in 
the cache.


was (Author: haibochen):
[~Feng Yuan] If cached is enabled, there shouldn't be massive LocalFileSystem 
instances, unless I am missing something

> Nodemanager heap memory leak
> 
>
> Key: YARN-6277
> URL: https://issues.apache.org/jira/browse/YARN-6277
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.3
>Reporter: Feng Yuan
>Assignee: Feng Yuan
> Attachments: YARN-6277.branch-2.8.001.patch
>
>
> Because LocalDirHandlerService@LocalDirAllocator`s mechanism,they will create 
> massive LocalFileSystem.So lead to heap leak.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6277) Nodemanager heap memory leak

2017-04-07 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961013#comment-15961013
 ] 

Haibo Chen commented on YARN-6277:
--

[~Feng Yuan] If cached is enabled, there shouldn't be massive LocalFileSystem 
instances, unless I am missing something

> Nodemanager heap memory leak
> 
>
> Key: YARN-6277
> URL: https://issues.apache.org/jira/browse/YARN-6277
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.3
>Reporter: Feng Yuan
>Assignee: Feng Yuan
> Attachments: YARN-6277.branch-2.8.001.patch
>
>
> Because LocalDirHandlerService@LocalDirAllocator`s mechanism,they will create 
> massive LocalFileSystem.So lead to heap leak.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6195) Export UsedCapacity and AbsoluteUsedCapacity to JMX

2017-04-07 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960991#comment-15960991
 ] 

Jason Lowe commented on YARN-6195:
--

Thanks for updating the patch!  At first I thought we had a potential for an 
NPE in CSQueueUtils since there's this comment at the top:
{code}
  /**
   * Update partitioned resource usage, if nodePartition == null, will update
   * used resource for all partitions of this queue.
   */
  public static void updateUsedCapacity(final ResourceCalculator rc,
{code}

However in practice all the callers translate a null label to the no label enum 
so we're good.  Looks like we'd have NPE problems even before this patch if 
nodePartition really was null, so that's a bad comment unrelated to this patch.

+1 for the latest patch.  I'll commit this early next week if there are no 
objections.

> Export UsedCapacity and AbsoluteUsedCapacity to JMX
> ---
>
> Key: YARN-6195
> URL: https://issues.apache.org/jira/browse/YARN-6195
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, metrics, yarn
>Affects Versions: 3.0.0-alpha3
>Reporter: Benson Qiu
>Assignee: Benson Qiu
> Attachments: YARN-6195.001.patch, YARN-6195.002.patch, 
> YARN-6195.003.patch, YARN-6195.004.patch, YARN-6195.005.patch
>
>
> `usedCapacity` and `absoluteUsedCapacity` are currently not available as JMX. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2017-04-07 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger reassigned YARN-6091:
-

Assignee: Eric Badger

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.1
> Environment: CentOS
>Reporter: zhengchenyu
>Assignee: Eric Badger
>Priority: Critical
> Attachments: YARN-6091.001.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2017-04-07 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6091:
--
Attachment: YARN-6091.001.patch

Uploading a patch that redirects stdout to /dev/null for the popen commands 
whose stdout we don't read. I tested this change locally leveraging the 
test-container-executor program. When you don't redirect stdout you get a 
return 13, which means SIGPIPE. When you do redirect, you get a return value of 
0. 

However, test-container-executor doesn't run without error, so I had to work 
around errors and comment out large pieces of the code to test the relevant 
section of in launch_docker_container_as_user that uses popen()/pclose(). We 
might need a new JIRA to fix test-container-executor. I'm not sure what the 
plan on it is going forward, since it's not a part of the maven testing. At 
this point, it's clearly out of sync so we need to either scrap it or maintain 
it. 

For context, I tried to run test-container-executor on both macOS and rhel7 and 
they both failed in separate ways. 

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.1
> Environment: CentOS
>Reporter: zhengchenyu
>Priority: Critical
> Attachments: YARN-6091.001.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6443) Allow for Priority order relaxing in favor of improved node/rack locality

2017-04-07 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960755#comment-15960755
 ] 

Jason Lowe commented on YARN-6443:
--

Ah, so this apparently is describing a problem that only can occur if scheduler 
keys are being used?  I'm not sure we need a flag here.  Seems like we simply 
should not guarantee that allocations are returned within a priority group in 
the order they are requested -- they can be returned in any order.  It 
certainly worked that way without scheduler keys.  If you need ordering, that's 
what priorities are for.  In that sense I see this not as an enhancement but 
rather a bugfix.  Or am I misunderstanding the problem?

> Allow for Priority order relaxing in favor of improved node/rack locality 
> --
>
> Key: YARN-6443
> URL: https://issues.apache.org/jira/browse/YARN-6443
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, fairscheduler
>Reporter: Arun Suresh
>Assignee: Hitesh Sharma
>
> Currently the Schedulers examine an applications pending Requests in Priority 
> order. This JIRA proposes to introduce a flag (either via the 
> ApplicationMasterService::registerApplication() or via some Scheduler 
> configuration) to favor an ordering that is baised to the node that is 
> currently heartbeating by relaxing the priority constraint.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6078) Containers stuck in Localizing state

2017-04-07 Thread Feng Yuan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960466#comment-15960466
 ] 

Feng Yuan commented on YARN-6078:
-

The NM stack trace seem indicate the localizerRunner thread stuck at the read 
pipe from process stdout.
Can you make sure the container-localizer process have start?

> Containers stuck in Localizing state
> 
>
> Key: YARN-6078
> URL: https://issues.apache.org/jira/browse/YARN-6078
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jagadish
>
> I encountered an interesting issue in one of our Yarn clusters (where the 
> containers are stuck in localizing phase).
> Our AM requests a container, and starts a process using the NMClient.
> According to the NM the container is in LOCALIZING state:
> {code}
> 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] 
> container.ContainerImpl.handle(ContainerImpl.java:1135) - Container 
> container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING
> 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] 
> localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711)
>  - Created localizer for container_e03_1481261762048_0541_02_60
> 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for 
> container_e03_1481261762048_0541_02_60] 
> localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191)
>  - Writing credentials to the nmPrivate file 
> /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. 
> Credentials list:
> {code}
> According to the RM the container is in RUNNING state:
> {code}
> 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] 
> rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - 
> container_e03_1481261762048_0541_02_60 Container Transitioned from 
> ALLOCATED to ACQUIRED
> 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] 
> rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - 
> container_e03_1481261762048_0541_02_60 Container Transitioned from 
> ACQUIRED to RUNNING
> {code}
> When I click the Yarn RM UI to view the logs for the container,  I get an 
> error
> that
> {code}
> No logs were found. state is LOCALIZING
> {code}
> The Node manager 's stack trace seems to indicate that the NM's 
> LocalizerRunner is stuck waiting to read from the sub-process's outputstream.
> {code}
> "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 
> prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable 
> [0x7fa5076c3000]
>java.lang.Thread.State: RUNNABLE
>   at java.io.FileInputStream.readBytes(Native Method)
>   at java.io.FileInputStream.read(FileInputStream.java:255)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   - locked <0xc6dc9c50> (a 
> java.lang.UNIXProcess$ProcessPipeInputStream)
>   at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>   at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>   at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>   - locked <0xc6dc9c78> (a java.io.InputStreamReader)
>   at java.io.InputStreamReader.read(InputStreamReader.java:184)
>   at java.io.BufferedReader.fill(BufferedReader.java:161)
>   at java.io.BufferedReader.read1(BufferedReader.java:212)
>   at java.io.BufferedReader.read(BufferedReader.java:286)
>   - locked <0xc6dc9c78> (a java.io.InputStreamReader)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786)
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:568)
>   at org.apache.hadoop.util.Shell.run(Shell.java:479)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1113)
> {code}
> I did a {code}ps aux{code} and confirmed that there was no container-executor 
> process running with INITIALIZE_CONTAINER that the localizer starts. It seems 
> that the output stream pipe of the process is still not closed (even though 
> the localizer process is no longer present).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6258) localBaseAddress for CORS proxy configuration is not working when suffixed with forward slash in new YARN UI.

2017-04-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960359#comment-15960359
 ] 

Hudson commented on YARN-6258:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11546 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11546/])
YARN-6258. localBaseAddress for CORS proxy configuration is not working 
(sunilg: rev ad24464be85bf37bf1677da1a2701a6acd7999d2)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/services/hosts.js


> localBaseAddress for CORS proxy configuration is not working when suffixed 
> with forward slash in new YARN UI.
> -
>
> Key: YARN-6258
> URL: https://issues.apache.org/jira/browse/YARN-6258
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Novák
>Assignee: Gergely Novák
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6258.001.patch
>
>
> If CORS proxy is configured for development purposes, all the yarn-node sites 
> (yarn-node, yarn-node-apps, yarn-node-containers) throw an error:
> {noformat}
> Error: Adapter operation failed
> at ember$data$lib$adapters$errors$$AdapterError.EmberError 
> (ember.debug.js:15860)
> at ember$data$lib$adapters$errors$$AdapterError (errors.js:19)
> at Class.handleResponse (rest-adapter.js:677)
> at Class.hash.error (rest-adapter.js:757)
> at fire (jquery.js:3099)
> at Object.fireWith [as rejectWith] (jquery.js:3211)
> at done (jquery.js:8266)
> at XMLHttpRequest. (jquery.js:8605)
> {noformat}
> This might be caused by bad request url: 
> "http://localhost:1337/{color:red}/{color}192.168.0.104:8042/ws/v1/node;.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6258) localBaseAddress for CORS proxy configuration is not working when suffixed with forward slash in new YARN UI.

2017-04-07 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-6258:
--
Summary: localBaseAddress for CORS proxy configuration is not working when 
suffixed with forward slash in new YARN UI.  (was: The node sites don't work 
with local (CORS) setup for new UI)

> localBaseAddress for CORS proxy configuration is not working when suffixed 
> with forward slash in new YARN UI.
> -
>
> Key: YARN-6258
> URL: https://issues.apache.org/jira/browse/YARN-6258
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Novák
>Assignee: Gergely Novák
> Attachments: YARN-6258.001.patch
>
>
> If CORS proxy is configured for development purposes, all the yarn-node sites 
> (yarn-node, yarn-node-apps, yarn-node-containers) throw an error:
> {noformat}
> Error: Adapter operation failed
> at ember$data$lib$adapters$errors$$AdapterError.EmberError 
> (ember.debug.js:15860)
> at ember$data$lib$adapters$errors$$AdapterError (errors.js:19)
> at Class.handleResponse (rest-adapter.js:677)
> at Class.hash.error (rest-adapter.js:757)
> at fire (jquery.js:3099)
> at Object.fireWith [as rejectWith] (jquery.js:3211)
> at done (jquery.js:8266)
> at XMLHttpRequest. (jquery.js:8605)
> {noformat}
> This might be caused by bad request url: 
> "http://localhost:1337/{color:red}/{color}192.168.0.104:8042/ws/v1/node;.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6258) The node sites don't work with local (CORS) setup for new UI

2017-04-07 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960337#comment-15960337
 ] 

Sunil G commented on YARN-6258:
---

Fell off from radar.. Committing now.

> The node sites don't work with local (CORS) setup for new UI
> 
>
> Key: YARN-6258
> URL: https://issues.apache.org/jira/browse/YARN-6258
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gergely Novák
>Assignee: Gergely Novák
> Attachments: YARN-6258.001.patch
>
>
> If CORS proxy is configured for development purposes, all the yarn-node sites 
> (yarn-node, yarn-node-apps, yarn-node-containers) throw an error:
> {noformat}
> Error: Adapter operation failed
> at ember$data$lib$adapters$errors$$AdapterError.EmberError 
> (ember.debug.js:15860)
> at ember$data$lib$adapters$errors$$AdapterError (errors.js:19)
> at Class.handleResponse (rest-adapter.js:677)
> at Class.hash.error (rest-adapter.js:757)
> at fire (jquery.js:3099)
> at Object.fireWith [as rejectWith] (jquery.js:3211)
> at done (jquery.js:8266)
> at XMLHttpRequest. (jquery.js:8605)
> {noformat}
> This might be caused by bad request url: 
> "http://localhost:1337/{color:red}/{color}192.168.0.104:8042/ws/v1/node;.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6449) Enable YARN to accept jobs with < 1 core allocations

2017-04-07 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960326#comment-15960326
 ] 

Allen Wittenauer commented on YARN-6449:


bq. I realize this topic has been discussed in detail but thought it might 
warrant another thought

If you want to do another discussion, do it on YARN-972 so that we don't need 
to rehash this topic again in a different place.

Closing as a dupe.

> Enable YARN to accept jobs with < 1 core allocations
> 
>
> Key: YARN-6449
> URL: https://issues.apache.org/jira/browse/YARN-6449
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Daniel Tomes
>  Labels: features, performance
>
> Product Enhancement Request
> In Spark/HIVE/etc. I often need to complete work for which an entire core is 
> overkill such as managing a JDBC connection or doing a simple map/transform; 
> however, when I do this on large datasets, 1 core X 500 partitions/mappers 
> winds up with quite the cluster level footprint even though most of those 
> processor cycles are idle.
> I propose that we enable YARN to allow a user to submit jobs that "allocate < 
> 1 core". Under the covers, the JVM will still receive one core but YARN/ZK 
> could keep track of the fractions of cores being used and allow other jobs to 
> consume the same core twice provided that both jobs were submitted with <= .5 
> cores. Now, YARN can more effectively utilize multi-threading and decrease 
> CPU idle for the power users.
> Obviously this can ultimately result in very bad outcomes, but if we also 
> enable security controls then customers can configure such that only 
> admins/gates can submit with < 1 full core and ultimately resulting in a 
> cluster that can do more.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6449) Enable YARN to accept jobs with < 1 core allocations

2017-04-07 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-6449.

Resolution: Duplicate

> Enable YARN to accept jobs with < 1 core allocations
> 
>
> Key: YARN-6449
> URL: https://issues.apache.org/jira/browse/YARN-6449
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Daniel Tomes
>  Labels: features, performance
>
> Product Enhancement Request
> In Spark/HIVE/etc. I often need to complete work for which an entire core is 
> overkill such as managing a JDBC connection or doing a simple map/transform; 
> however, when I do this on large datasets, 1 core X 500 partitions/mappers 
> winds up with quite the cluster level footprint even though most of those 
> processor cycles are idle.
> I propose that we enable YARN to allow a user to submit jobs that "allocate < 
> 1 core". Under the covers, the JVM will still receive one core but YARN/ZK 
> could keep track of the fractions of cores being used and allow other jobs to 
> consume the same core twice provided that both jobs were submitted with <= .5 
> cores. Now, YARN can more effectively utilize multi-threading and decrease 
> CPU idle for the power users.
> Obviously this can ultimately result in very bad outcomes, but if we also 
> enable security controls then customers can configure such that only 
> admins/gates can submit with < 1 full core and ultimately resulting in a 
> cluster that can do more.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org