[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated

2015-11-22 Thread lachisis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021630#comment-15021630
 ] 

lachisis commented on YARN-4382:


If lots of container hierarchys remained, it will make the cpu busy of this 
node, even when no jobs are running.

--
   PerfTop:  129889 irqs/sec  kernel:76.3% [10 cycles],  (all, 16 CPUs)
--

 samplespcnt   kernel function
 ___   _   ___

   117166.00 - 59.1% : tg_shares_up
35688.00 - 18.0% : _spin_lock_irqsave
12045.00 -  6.1% : __set_se_shares


> Container hierarchy in cgroup may remain for ever after the container have be 
> terminated
> 
>
> Key: YARN-4382
> URL: https://issues.apache.org/jira/browse/YARN-4382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.2
>Reporter: lachisis
>
> If we use LinuxContainerExecutor to executor the containers, this question 
> may happens.
> In the common case, when a container run, a corresponding hierarchy will be 
> created in cgroup dir. And when the container terminate, the hierarchy  will 
> be delete in some seconds(this time can be configured by 
> yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms).
> In the code, I find that, CgroupsLCEResource send a signal to kill container 
> process asynchronously, and in the same time, it will try to delete the 
> container hierarchy  in configured "delete-delay-ms" times. 
> But if the container process be killed for seconds which large than 
> "delete-delay-ms" time, the  container hierarchy  will remain for ever.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated

2015-11-22 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021666#comment-15021666
 ] 

Jun Gong commented on YARN-4382:


'release_agent' in cgroups 
([https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt|https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt])
 might help for this case. Maybe we could use it to remove empty dirs? SLURM 
has used it 
([http://slurm.schedmd.com/cgroups.html|http://slurm.schedmd.com/cgroups.html]).

{quote}
If the notify_on_release flag is enabled (1) in a cgroup, then whenever the 
last task in the cgroup leaves (exits or attaches to some other cgroup) and the 
last child cgroup of that cgroup is removed, then the kernel runs the command 
specified by the contents of the "release_agent" file in that hierarchy's root 
directory, supplying the pathname (relative to the mount point of the cgroup 
file system) of the abandoned cgroup.  This enables automatic removal of 
abandoned cgroups.  The default value of notify_on_release in the root cgroup 
at system boot is disabled (0).  The default value of other cgroups at creation 
is the current value of their parents' notify_on_release settings. The default 
value of a cgroup hierarchy's release_agent path is empty.
{quote}

> Container hierarchy in cgroup may remain for ever after the container have be 
> terminated
> 
>
> Key: YARN-4382
> URL: https://issues.apache.org/jira/browse/YARN-4382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.2
>Reporter: lachisis
>
> If we use LinuxContainerExecutor to executor the containers, this question 
> may happens.
> In the common case, when a container run, a corresponding hierarchy will be 
> created in cgroup dir. And when the container terminate, the hierarchy  will 
> be delete in some seconds(this time can be configured by 
> yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms).
> In the code, I find that, CgroupsLCEResource send a signal to kill container 
> process asynchronously, and in the same time, it will try to delete the 
> container hierarchy  in configured "delete-delay-ms" times. 
> But if the container process be killed for seconds which large than 
> "delete-delay-ms" time, the  container hierarchy  will remain for ever.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4382) Container hierarchy in cgroup may remain for ever after the container have be terminated

2015-11-22 Thread lachisis (JIRA)
lachisis created YARN-4382:
--

 Summary: Container hierarchy in cgroup may remain for ever after 
the container have be terminated
 Key: YARN-4382
 URL: https://issues.apache.org/jira/browse/YARN-4382
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.2
Reporter: lachisis


If we use LinuxContainerExecutor to executor the containers, this question may 
happens.
In the common case, when a container run, a corresponding hierarchy will be 
created in cgroup dir. And when the container terminate, the hierarchy  will be 
delete in some seconds(this time can be configured by 
yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms).

In the code, I find that, CgroupsLCEResource send a signal to kill container 
process asynchronously, and in the same time, it will try to delete the 
container hierarchy  in configured "delete-delay-ms" times. 
But if the container process be killed for seconds which large than 
"delete-delay-ms" time, the  container hierarchy  will remain for ever.




  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container

2015-11-22 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated YARN-4381:

Attachment: YARN-4381.001.patch

I attach a init patch and add two new metrics in {{NodeManagerMetrics}}
* containerLocalizeFailed
* containersLaunchEventOperation


> Add container launchEvent and container localizeFailed metrics in container
> ---
>
> Key: YARN-4381
> URL: https://issues.apache.org/jira/browse/YARN-4381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
> Attachments: YARN-4381.001.patch
>
>
> Recently, I found a issue on nodemanager metrics.That's 
> {{NodeManagerMetrics#containersLaunched}} is not actually means the container 
> succeed launched times.Because in some time, it will be failed when receiving 
> the killing command or happening container-localizationFailed.This will lead 
> to a failed container.But now,this counter value will be increased in these 
> code whenever the container is started successfully or failed.
> {code}
> Credentials credentials = parseCredentials(launchContext);
> Container container =
> new ContainerImpl(getConfig(), this.dispatcher,
> context.getNMStateStore(), launchContext,
>   credentials, metrics, containerTokenIdentifier);
> ApplicationId applicationID =
> containerId.getApplicationAttemptId().getApplicationId();
> if (context.getContainers().putIfAbsent(containerId, container) != null) {
>   NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
> "ContainerManagerImpl", "Container already running on this node!",
> applicationID, containerId);
>   throw RPCUtil.getRemoteException("Container " + containerIdStr
>   + " already is running on this node!!");
> }
> this.readLock.lock();
> try {
>   if (!serviceStopped) {
> // Create the application
> Application application =
> new ApplicationImpl(dispatcher, user, applicationID, credentials, 
> context);
> if (null == context.getApplications().putIfAbsent(applicationID,
>   application)) {
>   LOG.info("Creating a new application reference for app " + 
> applicationID);
>   LogAggregationContext logAggregationContext =
>   containerTokenIdentifier.getLogAggregationContext();
>   Map appAcls =
>   container.getLaunchContext().getApplicationACLs();
>   context.getNMStateStore().storeApplication(applicationID,
>   buildAppProto(applicationID, user, credentials, appAcls,
> logAggregationContext));
>   dispatcher.getEventHandler().handle(
> new ApplicationInitEvent(applicationID, appAcls,
>   logAggregationContext));
> }
> this.context.getNMStateStore().storeContainer(containerId, request);
> dispatcher.getEventHandler().handle(
>   new ApplicationContainerInitEvent(container));
> 
> this.context.getContainerTokenSecretManager().startContainerSuccessful(
>   containerTokenIdentifier);
> NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
>   "ContainerManageImpl", applicationID, containerId);
> // TODO launchedContainer misplaced -> doesn't necessarily mean a 
> container
> // launch. A finished Application will not launch containers.
> metrics.launchedContainer();
> metrics.allocateContainer(containerTokenIdentifier.getResource());
>   } else {
> throw new YarnException(
> "Container start failed as the NodeManager is " +
> "in the process of shutting down");
>   }
> {code}
> In addition, we are lack of localzationFailed metric in container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4165) An outstanding container request makes all nodes to be reserved causing all jobs pending

2015-11-22 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-4165:
--
Description: We have a long running service in YARN, it has an outstanding 
container request that YARN cannot satisfy (require more memory that 
nodemanager can supply). Then YARN reserves all nodes for this application, 
when I submit other jobs (require relative small memory that nodemanager can 
supply), all jobs are pending because YARN skips scheduling containers on the 
nodes that have been reserved.  (was: We have a long running service in YARN, 
it has a outstanding container request that YARN cannot satisfy (require more 
memory that nodemanager can supply). Then YARN reserves all nodes for this 
application, when I submit other jobs (require relative small memory that 
nodemanager can supply), all jobs are pending because YARN skips scheduling 
containers on the nodes that have been reserved.)

> An outstanding container request makes all nodes to be reserved causing all 
> jobs pending
> 
>
> Key: YARN-4165
> URL: https://issues.apache.org/jira/browse/YARN-4165
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, resourcemanager, scheduler
>Affects Versions: 2.7.1
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>
> We have a long running service in YARN, it has an outstanding container 
> request that YARN cannot satisfy (require more memory that nodemanager can 
> supply). Then YARN reserves all nodes for this application, when I submit 
> other jobs (require relative small memory that nodemanager can supply), all 
> jobs are pending because YARN skips scheduling containers on the nodes that 
> have been reserved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler

2015-11-22 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021397#comment-15021397
 ] 

Inigo Goiri commented on YARN-3980:
---

It looks like v9 solves the issues.
Let me know if you see anything else.

> Plumb resource-utilization info in node heartbeat through to the scheduler
> --
>
> Key: YARN-3980
> URL: https://issues.apache.org/jira/browse/YARN-3980
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.7.1
>Reporter: Karthik Kambatla
>Assignee: Inigo Goiri
> Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, 
> YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch, 
> YARN-3980-v5.patch, YARN-3980-v6.patch, YARN-3980-v7.patch, 
> YARN-3980-v8.patch, YARN-3980-v9.patch
>
>
> YARN-1012 and YARN-3534 collect resource utilization information for all 
> containers and the node respectively and send it to the RM on node heartbeat. 
> We should plumb it through to the scheduler so the scheduler can make use of 
> it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler

2015-11-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021243#comment-15021243
 ] 

Hadoop QA commented on YARN-3980:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
37s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 40s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 7s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
7s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 26s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 39s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 35s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 57s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 57s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 6s 
{color} | {color:red} Patch generated 7 new checkstyle issues in root (total 
was 432, now 438). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 12s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 34s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 35s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 14s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 5m 46s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 57s 
{color} | {color:green} hadoop-sls in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 29s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 21s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_85. {color} |
| 

[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler

2015-11-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021271#comment-15021271
 ] 

Hadoop QA commented on YARN-3980:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
2s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 6s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 2s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 48s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
45s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
30s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 4s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 4s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 7s 
{color} | {color:red} Patch generated 7 new checkstyle issues in root (total 
was 433, now 439). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 55s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 10s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 23s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 13s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 3s 
{color} | {color:green} hadoop-sls in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 45s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_85. {color} |
| 

[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler

2015-11-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021364#comment-15021364
 ] 

Hadoop QA commented on YARN-3980:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
28s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 6s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
8s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 28s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
13s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
19s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 19s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 8s 
{color} | {color:red} Patch generated 7 new checkstyle issues in root (total 
was 433, now 439). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 55s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 12s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 23s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 21s {color} 
| {color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 20s {color} 
| {color:red} hadoop-sls in the patch failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 30s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.7.0_85. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 12s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.7.0_85. {color} |
| 

[jira] [Created] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container

2015-11-22 Thread Lin Yiqun (JIRA)
Lin Yiqun created YARN-4381:
---

 Summary: Add container launchEvent and container localizeFailed 
metrics in container
 Key: YARN-4381
 URL: https://issues.apache.org/jira/browse/YARN-4381
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: Lin Yiqun


Recently, I found a issue on nodemanager metrics.That's 
{{NodeManagerMetrics#containersLaunched}} is not actually means the container 
succeed launched times.Because in some time, it will be failed when receiving 
the killing command or happening container-localizationFailed.This will lead to 
a failed container.But now,this counter value will be increased in these code 
whenever the container is started successfully or failed.
{code}
Credentials credentials = parseCredentials(launchContext);

Container container =
new ContainerImpl(getConfig(), this.dispatcher,
context.getNMStateStore(), launchContext,
  credentials, metrics, containerTokenIdentifier);
ApplicationId applicationID =
containerId.getApplicationAttemptId().getApplicationId();
if (context.getContainers().putIfAbsent(containerId, container) != null) {
  NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
"ContainerManagerImpl", "Container already running on this node!",
applicationID, containerId);
  throw RPCUtil.getRemoteException("Container " + containerIdStr
  + " already is running on this node!!");
}

this.readLock.lock();
try {
  if (!serviceStopped) {
// Create the application
Application application =
new ApplicationImpl(dispatcher, user, applicationID, credentials, 
context);
if (null == context.getApplications().putIfAbsent(applicationID,
  application)) {
  LOG.info("Creating a new application reference for app " + 
applicationID);
  LogAggregationContext logAggregationContext =
  containerTokenIdentifier.getLogAggregationContext();
  Map appAcls =
  container.getLaunchContext().getApplicationACLs();
  context.getNMStateStore().storeApplication(applicationID,
  buildAppProto(applicationID, user, credentials, appAcls,
logAggregationContext));
  dispatcher.getEventHandler().handle(
new ApplicationInitEvent(applicationID, appAcls,
  logAggregationContext));
}

this.context.getNMStateStore().storeContainer(containerId, request);
dispatcher.getEventHandler().handle(
  new ApplicationContainerInitEvent(container));

this.context.getContainerTokenSecretManager().startContainerSuccessful(
  containerTokenIdentifier);
NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
  "ContainerManageImpl", applicationID, containerId);
// TODO launchedContainer misplaced -> doesn't necessarily mean a 
container
// launch. A finished Application will not launch containers.
metrics.launchedContainer();
metrics.allocateContainer(containerTokenIdentifier.getResource());
  } else {
throw new YarnException(
"Container start failed as the NodeManager is " +
"in the process of shutting down");
  }
{code}
In addition, we are lack of localzationFailed metric in container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3769) Consider user limit when calculating total pending resource for preemption policy in Capacity Scheduler

2015-11-22 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021053#comment-15021053
 ] 

Eric Payne commented on YARN-3769:
--

[~leftnoteasy], Thank you very much!

> Consider user limit when calculating total pending resource for preemption 
> policy in Capacity Scheduler
> ---
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 2.7.3
>
> Attachments: YARN-3769-branch-2.002.patch, 
> YARN-3769-branch-2.7.002.patch, YARN-3769-branch-2.7.003.patch, 
> YARN-3769-branch-2.7.005.patch, YARN-3769-branch-2.7.006.patch, 
> YARN-3769-branch-2.7.007.patch, YARN-3769.001.branch-2.7.patch, 
> YARN-3769.001.branch-2.8.patch, YARN-3769.003.patch, YARN-3769.004.patch, 
> YARN-3769.005.patch
>
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4379) TestWebApp failing in trunk

2015-11-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020948#comment-15020948
 ] 

Steve Loughran commented on YARN-4379:
--

I know its a simple patch, but I'm debating reverting HADOOP-12584 directly and 
require that patch to include this. Authors of patch have to get into the habit 
of being rigorous about testing things, and for changes into Hadoop core, that 
includes YARN and HDFS test runs. Web-wise, YARN and downstream apps are 
particularly fussy

> TestWebApp failing in trunk
> ---
>
> Key: YARN-4379
> URL: https://issues.apache.org/jira/browse/YARN-4379
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4379.01.patch
>
>
> {noformat}
> Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.917 sec <<< 
> FAILURE! - in org.apache.hadoop.yarn.webapp.TestWebApp
> testYARNWebAppContext(org.apache.hadoop.yarn.webapp.TestWebApp)  Time 
> elapsed: 1.828 sec  <<< ERROR!
> java.lang.RuntimeException: java.io.IOException: Server returned HTTP 
> response code: 403 for URL: http://localhost:57512/static/
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1628)
>   at 
> org.apache.hadoop.yarn.webapp.TestWebApp.getContent(TestWebApp.java:293)
>   at 
> org.apache.hadoop.yarn.webapp.TestWebApp.testYARNWebAppContext(TestWebApp.java:276)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3456) Improve handling of incomplete TimelineEntities

2015-11-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021151#comment-15021151
 ] 

Hadoop QA commented on YARN-3456:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
56s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 31s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
trunk has 3 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
1s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 31s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 34s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 34s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 33s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 36, now 37). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 19s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 3s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 20s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 5s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 47m 18s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 

[jira] [Updated] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler

2015-11-22 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-3980:
--
Attachment: YARN-3980-v8.patch

Tackling Karthik's comments.

> Plumb resource-utilization info in node heartbeat through to the scheduler
> --
>
> Key: YARN-3980
> URL: https://issues.apache.org/jira/browse/YARN-3980
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.7.1
>Reporter: Karthik Kambatla
>Assignee: Inigo Goiri
> Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, 
> YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch, 
> YARN-3980-v5.patch, YARN-3980-v6.patch, YARN-3980-v7.patch, YARN-3980-v8.patch
>
>
> YARN-1012 and YARN-3534 collect resource utilization information for all 
> containers and the node respectively and send it to the RM on node heartbeat. 
> We should plumb it through to the scheduler so the scheduler can make use of 
> it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler

2015-11-22 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-3980:
--
Attachment: (was: YARN-3980-v8.patch)

> Plumb resource-utilization info in node heartbeat through to the scheduler
> --
>
> Key: YARN-3980
> URL: https://issues.apache.org/jira/browse/YARN-3980
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.7.1
>Reporter: Karthik Kambatla
>Assignee: Inigo Goiri
> Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, 
> YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch, 
> YARN-3980-v5.patch, YARN-3980-v6.patch, YARN-3980-v7.patch
>
>
> YARN-1012 and YARN-3534 collect resource utilization information for all 
> containers and the node respectively and send it to the RM on node heartbeat. 
> We should plumb it through to the scheduler so the scheduler can make use of 
> it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler

2015-11-22 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-3980:
--
Attachment: YARN-3980-v8.patch

Increasing timeout to 2 seconds again because unit tests fail.

> Plumb resource-utilization info in node heartbeat through to the scheduler
> --
>
> Key: YARN-3980
> URL: https://issues.apache.org/jira/browse/YARN-3980
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.7.1
>Reporter: Karthik Kambatla
>Assignee: Inigo Goiri
> Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, 
> YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch, 
> YARN-3980-v5.patch, YARN-3980-v6.patch, YARN-3980-v7.patch, YARN-3980-v8.patch
>
>
> YARN-1012 and YARN-3534 collect resource utilization information for all 
> containers and the node respectively and send it to the RM on node heartbeat. 
> We should plumb it through to the scheduler so the scheduler can make use of 
> it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3456) Improve handling of incomplete TimelineEntities

2015-11-22 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3456:
---
Attachment: YARN-3456.02.patch

> Improve handling of incomplete TimelineEntities
> ---
>
> Key: YARN-3456
> URL: https://issues.apache.org/jira/browse/YARN-3456
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-3456.01.patch, YARN-3456.02.patch
>
>
> If an incomplete TimelineEntity is posted, it isn't checked client side ... 
> it gets all the way to the far end before triggering an NPE in the store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3456) Improve handling of incomplete TimelineEntities

2015-11-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021118#comment-15021118
 ] 

Varun Saxena commented on YARN-3456:


Sorry Kuhu. Had missed your comment.
I will have bandwidth to handle this JIRA. Its a straightforward fix. Will 
update a patch shortly.

> Improve handling of incomplete TimelineEntities
> ---
>
> Key: YARN-3456
> URL: https://issues.apache.org/jira/browse/YARN-3456
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Varun Saxena
>Priority: Minor
>
> If an incomplete TimelineEntity is posted, it isn't checked client side ... 
> it gets all the way to the far end before triggering an NPE in the store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3456) Improve handling of incomplete TimelineEntities

2015-11-22 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3456:
---
Attachment: YARN-3456.01.patch

> Improve handling of incomplete TimelineEntities
> ---
>
> Key: YARN-3456
> URL: https://issues.apache.org/jira/browse/YARN-3456
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-3456.01.patch
>
>
> If an incomplete TimelineEntity is posted, it isn't checked client side ... 
> it gets all the way to the far end before triggering an NPE in the store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3456) Improve handling of incomplete TimelineEntities

2015-11-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021128#comment-15021128
 ] 

Varun Saxena commented on YARN-3456:


Updated a patch. Added checks at both client and server side.

> Improve handling of incomplete TimelineEntities
> ---
>
> Key: YARN-3456
> URL: https://issues.apache.org/jira/browse/YARN-3456
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-3456.01.patch
>
>
> If an incomplete TimelineEntity is posted, it isn't checked client side ... 
> it gets all the way to the far end before triggering an NPE in the store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4298) Fix findbugs warnings in hadoop-yarn-common

2015-11-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021136#comment-15021136
 ] 

Varun Saxena commented on YARN-4298:


bq. please let me know if this is handled by you in any other tickets.
Thanks for taking this up [~sunilg]. No, I havent handled it anywhere else.

> Fix findbugs warnings in hadoop-yarn-common
> ---
>
> Key: YARN-4298
> URL: https://issues.apache.org/jira/browse/YARN-4298
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Sunil G
>Priority: Minor
> Attachments: 0001-YARN-4298.patch, 0002-YARN-4298.patch
>
>
> {noformat}
>  classname='org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl'>
>  category='MT_CORRECTNESS' message='Inconsistent synchronization of 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.builder;
>  locked 95% of time' lineNumber='390'/>
>  category='MT_CORRECTNESS' message='Inconsistent synchronization of 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.proto;
>  locked 94% of time' lineNumber='390'/>
>  category='MT_CORRECTNESS' message='Inconsistent synchronization of 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.viaProto;
>  locked 94% of time' lineNumber='390'/>
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8

2015-11-22 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4380:
-
Summary: 
TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
intermittently on branch-2.8  (was: 
TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
on branch-2.8)

> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently on branch-2.8
> --
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Tsuyoshi Ozawa
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1438) When a container fails, the text of the exception isn't included in the diagnostics

2015-11-22 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-1438:
-
Fix Version/s: 2.3.0

> When a container fails, the text of the exception isn't included in the 
> diagnostics
> ---
>
> Key: YARN-1438
> URL: https://issues.apache.org/jira/browse/YARN-1438
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Fix For: 2.3.0
>
> Attachments: YARN-1438-001.patch
>
>
> The diagnostics text generated when a container execution thrown an exception 
> doesn't include the exception message -only the stack trace. This makes 
> debugging harder than necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4379) TestWebApp failing in trunk

2015-11-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021086#comment-15021086
 ] 

Varun Saxena commented on YARN-4379:


Ok...If HADOOP-12584 is reverted, it should be handled there.
Probably QA reports should run YARN, MR and HDFS tests too while building 
common. But that can increase build times.

> TestWebApp failing in trunk
> ---
>
> Key: YARN-4379
> URL: https://issues.apache.org/jira/browse/YARN-4379
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4379.01.patch
>
>
> {noformat}
> Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.917 sec <<< 
> FAILURE! - in org.apache.hadoop.yarn.webapp.TestWebApp
> testYARNWebAppContext(org.apache.hadoop.yarn.webapp.TestWebApp)  Time 
> elapsed: 1.828 sec  <<< ERROR!
> java.lang.RuntimeException: java.io.IOException: Server returned HTTP 
> response code: 403 for URL: http://localhost:57512/static/
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1628)
>   at 
> org.apache.hadoop.yarn.webapp.TestWebApp.getContent(TestWebApp.java:293)
>   at 
> org.apache.hadoop.yarn.webapp.TestWebApp.testYARNWebAppContext(TestWebApp.java:276)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2444) Primary filters added after first submission not indexed, cause exceptions in logs.

2015-11-22 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-2444.
--
Resolution: Won't Fix

closing at a wontfix; its moot in ATS2 and the SPARK-1537 code does its own 
REST client

> Primary filters added after first submission not indexed, cause exceptions in 
> logs.
> ---
>
> Key: YARN-2444
> URL: https://issues.apache.org/jira/browse/YARN-2444
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.5.0
>Reporter: Marcelo Vanzin
>Assignee: Steve Loughran
> Attachments: YARN-2444-001.patch, ats.java, 
> org.apache.hadoop.yarn.server.timeline.TestTimelineClientPut-output.txt
>
>
> See attached code for an example. The code creates an entity with a primary 
> filter, submits it to the ATS. After that, a new primary filter value is 
> added and the entity is resubmitted. At that point two things can be seen:
> - Searching for the new primary filter value does not return the entity
> - The following exception shows up in the logs:
> {noformat}
> 14/08/22 11:33:42 ERROR webapp.TimelineWebServices: Error when verifying 
> access for user dr.who (auth:SIMPLE) on the events of the timeline entity { 
> id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test }
> org.apache.hadoop.yarn.exceptions.YarnException: Owner information of the 
> timeline entity { id: testid-48625678-9cbb-4e71-87de-93c50be51d1a, type: test 
> } is corrupted.
> at 
> org.apache.hadoop.yarn.server.timeline.security.TimelineACLsManager.checkAccess(TimelineACLsManager.java:67)
> at 
> org.apache.hadoop.yarn.server.timeline.webapp.TimelineWebServices.getEntities(TimelineWebServices.java:172)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails on branch-2.8

2015-11-22 Thread Tsuyoshi Ozawa (JIRA)
Tsuyoshi Ozawa created YARN-4380:


 Summary: 
TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
on branch-2.8
 Key: YARN-4380
 URL: https://issues.apache.org/jira/browse/YARN-4380
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.8.0
Reporter: Tsuyoshi Ozawa


{quote}
Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec <<< 
FAILURE! - in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
  Time elapsed: 0.109 sec  <<< FAILURE!
org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
Argument(s) are different! Wanted:
deletionService.delete(
"user0",
null,

);
-> at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
Actual invocation has different arguments:
deletionService.delete(
"user0",

/home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
);
-> at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3456) Improve handling of incomplete TimelineEntities

2015-11-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021188#comment-15021188
 ] 

Hadoop QA commented on YARN-3456:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 46s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 41s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
31s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 33s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
trunk has 3 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
1s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 43s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 40s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 22s {color} 
| {color:red} hadoop-yarn-common in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 20s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 35s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 11s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
25s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 50m 3s {color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.logaggregation.TestAggregatedLogDeletionService |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL 

[jira] [Comment Edited] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler

2015-11-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021189#comment-15021189
 ] 

Karthik Kambatla edited comment on YARN-3980 at 11/22/15 9:11 PM:
--

Jenkins could be running on a (virtual) machine much slower than dev machines. 
Could we sleep for longer to ensure it doesn't fail on Jenkins. How about the 
following:
{code} 
for (int i = 0; i < 100; i++) {
  Thread.sleep(100);
  // check if heartbeat propagated
}
{code}


was (Author: kasha):
Jenkins could be running on a (virtual) machine much slower than dev machines. 
Could we sleep for longer to ensure it doesn't fail on Jenkins. How about the 
following:
{code} 
for (int i = 0; i < 100; i++) {
  Thread.sleep(100);
  // check if heartbeat propagated
}

> Plumb resource-utilization info in node heartbeat through to the scheduler
> --
>
> Key: YARN-3980
> URL: https://issues.apache.org/jira/browse/YARN-3980
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.7.1
>Reporter: Karthik Kambatla
>Assignee: Inigo Goiri
> Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, 
> YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch, 
> YARN-3980-v5.patch, YARN-3980-v6.patch, YARN-3980-v7.patch, YARN-3980-v8.patch
>
>
> YARN-1012 and YARN-3534 collect resource utilization information for all 
> containers and the node respectively and send it to the RM on node heartbeat. 
> We should plumb it through to the scheduler so the scheduler can make use of 
> it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler

2015-11-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021189#comment-15021189
 ] 

Karthik Kambatla commented on YARN-3980:


Jenkins could be running on a (virtual) machine much slower than dev machines. 
Could we sleep for longer to ensure it doesn't fail on Jenkins. How about the 
following:
{code} 
for (int i = 0; i < 100; i++) {
  Thread.sleep(100);
  // check if heartbeat propagated
}

> Plumb resource-utilization info in node heartbeat through to the scheduler
> --
>
> Key: YARN-3980
> URL: https://issues.apache.org/jira/browse/YARN-3980
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.7.1
>Reporter: Karthik Kambatla
>Assignee: Inigo Goiri
> Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, 
> YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch, 
> YARN-3980-v5.patch, YARN-3980-v6.patch, YARN-3980-v7.patch, YARN-3980-v8.patch
>
>
> YARN-1012 and YARN-3534 collect resource utilization information for all 
> containers and the node respectively and send it to the RM on node heartbeat. 
> We should plumb it through to the scheduler so the scheduler can make use of 
> it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler

2015-11-22 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021201#comment-15021201
 ] 

Inigo Goiri commented on YARN-3980:
---

Any easy place to check in the RM for the heartbeats?

> Plumb resource-utilization info in node heartbeat through to the scheduler
> --
>
> Key: YARN-3980
> URL: https://issues.apache.org/jira/browse/YARN-3980
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.7.1
>Reporter: Karthik Kambatla
>Assignee: Inigo Goiri
> Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, 
> YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch, 
> YARN-3980-v5.patch, YARN-3980-v6.patch, YARN-3980-v7.patch, YARN-3980-v8.patch
>
>
> YARN-1012 and YARN-3534 collect resource utilization information for all 
> containers and the node respectively and send it to the RM on node heartbeat. 
> We should plumb it through to the scheduler so the scheduler can make use of 
> it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3980) Plumb resource-utilization info in node heartbeat through to the scheduler

2015-11-22 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-3980:
--
Attachment: YARN-3980-v9.patch

Waiting for heartbeat to propagate based on the expected value. This apporoahc 
is simlar to what is used in TestDiskFailures.

> Plumb resource-utilization info in node heartbeat through to the scheduler
> --
>
> Key: YARN-3980
> URL: https://issues.apache.org/jira/browse/YARN-3980
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.7.1
>Reporter: Karthik Kambatla
>Assignee: Inigo Goiri
> Attachments: YARN-3980-v0.patch, YARN-3980-v1.patch, 
> YARN-3980-v2.patch, YARN-3980-v3.patch, YARN-3980-v4.patch, 
> YARN-3980-v5.patch, YARN-3980-v6.patch, YARN-3980-v7.patch, 
> YARN-3980-v8.patch, YARN-3980-v9.patch
>
>
> YARN-1012 and YARN-3534 collect resource utilization information for all 
> containers and the node respectively and send it to the RM on node heartbeat. 
> We should plumb it through to the scheduler so the scheduler can make use of 
> it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)