[jira] [Comment Edited] (YARN-4624) NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI

2016-08-02 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405371#comment-15405371
 ] 

Sunil G edited comment on YARN-4624 at 8/3/16 5:54 AM:
---

Yes [~Naganarasimha Garla]. Thanks for the update. Attaching a rebased patch 
given by [~brahmareddy]. Test case is not needed as we are changes data type 
from boxed to normal float.


was (Author: sunilg):
Yes [~Naganarasimha Garla]. Thanks for the updated. Attaching a rebased patch 
given by [~brahmareddy]. Test case is not needed as we are changes data type 
from boxed to normal float.

> NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI
> ---
>
> Key: YARN-4624
> URL: https://issues.apache.org/jira/browse/YARN-4624
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: SchedulerUIWithOutLabelMapping.png, YARN-2674-002.patch, 
> YARN-4624-003.patch, YARN-4624.4.patch, YARN-4624.patch
>
>
> Scenario:
> ===
> Configure nodelables and add to cluster
> Start the cluster
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.PartitionQueueCapacitiesInfo.getMaxAMLimitPercentage(PartitionQueueCapacitiesInfo.java:114)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:105)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:94)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:293)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:447)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4624) NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI

2016-08-02 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4624:
--
Attachment: YARN-4624.4.patch

Yes [~Naganarasimha Garla]. Thanks for the updated. Attaching a rebased patch 
given by [~brahmareddy]. Test case is not needed as we are changes data type 
from boxed to normal float.

> NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI
> ---
>
> Key: YARN-4624
> URL: https://issues.apache.org/jira/browse/YARN-4624
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: SchedulerUIWithOutLabelMapping.png, YARN-2674-002.patch, 
> YARN-4624-003.patch, YARN-4624.4.patch, YARN-4624.patch
>
>
> Scenario:
> ===
> Configure nodelables and add to cluster
> Start the cluster
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.PartitionQueueCapacitiesInfo.getMaxAMLimitPercentage(PartitionQueueCapacitiesInfo.java:114)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:105)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:94)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:293)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:447)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5428) Allow for specifying the docker client configuration directory

2016-08-02 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405355#comment-15405355
 ] 

Allen Wittenauer commented on YARN-5428:


Why would an admin provide creds and not individual users?  Why should there be 
a global store of credentials?  What prevents a user from stealing these global 
creds?

> Allow for specifying the docker client configuration directory
> --
>
> Key: YARN-5428
> URL: https://issues.apache.org/jira/browse/YARN-5428
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
> Attachments: YARN-5428.001.patch, YARN-5428.002.patch, 
> YARN-5428.003.patch, YARN-5428.004.patch
>
>
> The docker client allows for specifying a configuration directory that 
> contains the docker client's configuration. It is common to store "docker 
> login" credentials in this config, to avoid the need to docker login on each 
> cluster member. 
> By default the docker client config is $HOME/.docker/config.json on Linux. 
> However, this does not work with the current container executor user 
> switching and it may also be desirable to centralize this configuration 
> beyond the single user's home directory.
> Note that the command line arg is for the configuration directory NOT the 
> configuration file.
> This change will be needed to allow YARN to automatically pull images at 
> localization time or within container executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5456) container-executor support for FreeBSD, NetBSD, and others if conf path is absolute

2016-08-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405349#comment-15405349
 ] 

Hudson commented on YARN-5456:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #10198 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10198/])
YARN-5456. container-executor support for FreeBSD, NetBSD, and others if 
(cnauroth: rev b913677365ad77ca7daa5741c04c14df1a0313cd)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/configuration.h
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/get_executable.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/CMakeLists.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/config.h.cmake


> container-executor support for FreeBSD, NetBSD, and others if conf path is 
> absolute
> ---
>
> Key: YARN-5456
> URL: https://issues.apache.org/jira/browse/YARN-5456
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, security
>Affects Versions: 3.0.0-alpha2
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>  Labels: security
> Fix For: 3.0.0-alpha2
>
> Attachments: YARN-5456.00.patch, YARN-5456.01.patch
>
>
> YARN-5121 fixed quite a few portability issues, but it also changed how it 
> determines it's location to be very operating specific for security reasons.  
> We should add support for FreeBSD to unbreak it's ports entry, NetBSD (the 
> sysctl options are just in a different order), and for operating systems that 
> do not have a defined method, an escape hatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5430) Get container's ip and host from NM

2016-08-02 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5430:
--
Attachment: YARN-5430.3.patch

> Get container's ip and host from NM
> ---
>
> Key: YARN-5430
> URL: https://issues.apache.org/jira/browse/YARN-5430
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5430.1.patch, YARN-5430.2.patch, YARN-5430.3.patch
>
>
> In YARN-4757, we introduced a DNS mechanism for containers. That's based on 
> the assumption  that we can get the container's ip and host information and 
> store it in the registry-service. This jira aims to get the container's ip 
> and host from the NM, primarily docker container



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5342) Improve non-exclusive node partition resource allocation in Capacity Scheduler

2016-08-02 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405332#comment-15405332
 ] 

Naganarasimha G R commented on YARN-5342:
-

Thanks for attaching the patch for 2.8, seems like jenkins run is fine, Test 
case failures are not related to the patch.
 [~wangda] can you commit patch and resolve this jira ?

> Improve non-exclusive node partition resource allocation in Capacity Scheduler
> --
>
> Key: YARN-5342
> URL: https://issues.apache.org/jira/browse/YARN-5342
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: YARN-5342-branch-2.8.001.patch, YARN-5342.1.patch, 
> YARN-5342.2.patch, YARN-5342.3.patch, YARN-5342.4.patch
>
>
> In the previous implementation, one non-exclusive container allocation is 
> possible when the missed-opportunity >= #cluster-nodes. And 
> missed-opportunity will be reset when container allocated to any node.
> This will slow down the frequency of container allocation on non-exclusive 
> node partition: *When a non-exclusive partition=x has idle resource, we can 
> only allocate one container for this app in every 
> X=nodemanagers.heartbeat-interval secs for the whole cluster.*
> In this JIRA, I propose a fix to reset missed-opporunity only if we have >0 
> pending resource for the non-exclusive partition OR we get allocation from 
> the default partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5448) Resource in Cluster Metrics is not sum of resources in all nodes of all partitions

2016-08-02 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405320#comment-15405320
 ] 

Naganarasimha G R commented on YARN-5448:
-

Thanks for sharing your thoughts [~wangda],

bq. Sorry I may not quite sure about this. Could you explain?
What i meant was, these additional  non-usable resources.columns as part of 
cluster metrics table will be use full only when there is a configuration error 
and once corrected these columns are not of much use, basically these columns 
purpose will be almost nill if configured correctly.
One alternative i can think of is show these columns only when partitions are 
not mapped to queues. and if value is zero then dont show, thoughts ?

bq.  which can help answering questions like "why I cannot fully utilize the 
cluster".
One view point what i had for this was captured in the above [comment | 
https://issues.apache.org/jira/browse/YARN-5448?focusedCommentId=15399248=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15399248],
 but well again its a view point and debatable so dont have any hard 
restrictions on having it.

bq. It's better to add a non-usable nodes as a separate col, but to me it may 
not a fully replacement of total non-usable resources.
May be i did not get the rationale behind {{"total non-usable resources"}} 
would be better than {{"non-usable nodes"}}, can elaborate more on your view on 
this ? 

> Resource in Cluster Metrics is not sum of resources in all nodes of all 
> partitions
> --
>
> Key: YARN-5448
> URL: https://issues.apache.org/jira/browse/YARN-5448
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager, webapp
>Affects Versions: 2.7.2
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: NodesPage.png, schedulerPage.png
>
>
> Currently Resource info from Cluster Metrics are got from Queue Metrics's 
> *available resource + allocated resource*. Hence if there are some nodes 
> which belongs to partition but if that partition is not associated with any 
> queue then in the capacity scheduler partition hierarchy shows this nodes 
> resources under its partition but Cluster metrics doesn't show. 
> Apart from this in the Nodes page too Metrics overview table is shown. So if 
> we show Resource info from Queue Metrics User will not be able to co relate 
> it. (have attached the images for the same)
> IIUC idea of not showing in the *Metrics overview table* is to highlight that 
> configuration is not proper. This needs to be some how conveyed through  
> parititon-by-queue-hierarchy chart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications

2016-08-02 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405321#comment-15405321
 ] 

Jian He commented on YARN-5382:
---

bq. I see only one audit log message when I ran a sleep job and killed it on 
pseudo-distributed setup on my laptop
I checked the code more, that's because AppKilledTransition will not get the 
RMAppKillByClientEvent any more if there exists an attempt - 
AppKilledTransition is processing the event sent from RMAppAttempt if there 
exists the attempt.  Anyway, this actually makes things better, because we 
won't have two audit logs. 
- This code is exactly the same in two places, would you make a common method 
for it ?
{code}
  if (event instanceof RMAppKillByClientEvent) {
RMAppKillByClientEvent killEvent = (RMAppKillByClientEvent) event;
UserGroupInformation callerUGI = killEvent.getCallerUGI();
String userName = null;
if (callerUGI != null) {
  userName = callerUGI.getShortUserName();
}
InetAddress remoteIP = killEvent.getIp();
RMAuditLogger.logSuccess(userName, AuditConstants.KILL_APP_REQUEST,
"RMAppImpl", event.getApplicationId(), remoteIP);
  }
{code}
- Isn't "greater than" the correct wording ?
{code}
-Assert.assertTrue("application start time is not greater than 0",
+Assert.assertTrue("application start time is not greater then 0",
{code}
- several parameters are not used in this method 
testSuccessLogFormatHelperWithIP, remove them ?
- nit highlighted by IDE:  "returns the {@link CallerUGI}"  the CallerUGI is 
actually not a link.


> RM does not audit log kill request for active applications
> --
>
> Key: YARN-5382
> URL: https://issues.apache.org/jira/browse/YARN-5382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
>Assignee: Vrushali C
> Attachments: YARN-5382-branch-2.7.01.patch, 
> YARN-5382-branch-2.7.02.patch, YARN-5382-branch-2.7.03.patch, 
> YARN-5382-branch-2.7.04.patch, YARN-5382-branch-2.7.05.patch, 
> YARN-5382-branch-2.7.09.patch, YARN-5382-branch-2.7.10.patch, 
> YARN-5382.06.patch, YARN-5382.07.patch, YARN-5382.08.patch, 
> YARN-5382.09.patch, YARN-5382.10.patch
>
>
> ClientRMService will audit a kill request but only if it either fails to 
> issue the kill or if the kill is sent to an already finished application.  It 
> does not create a log entry when the application is active which is arguably 
> the most important case to audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5327) API changes required to support recurring reservations in the YARN ReservationSystem

2016-08-02 Thread Sangeetha Abdu Jyothi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405299#comment-15405299
 ] 

Sangeetha Abdu Jyothi commented on YARN-5327:
-

Please note that the failed test in unrelated to this patch. 

> API changes required to support recurring reservations in the YARN 
> ReservationSystem
> 
>
> Key: YARN-5327
> URL: https://issues.apache.org/jira/browse/YARN-5327
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Subru Krishnan
>Assignee: Sangeetha Abdu Jyothi
> Attachments: YARN-5327.001.patch, YARN-5327.002.patch, 
> YARN-5327.003.patch
>
>
> YARN-5326 proposes adding native support for recurring reservations in the 
> YARN ReservationSystem. This JIRA is a sub-task to track the changes needed 
> in ApplicationClientProtocol to accomplish it. Please refer to the design doc 
> in the parent JIRA for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4624) NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI

2016-08-02 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405281#comment-15405281
 ] 

Naganarasimha G R commented on YARN-4624:
-

[~sunilg], As discussed offline safer option is to go with patch 1, so can you 
rebase the patch so that we can make progress in this jira ?

> NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI
> ---
>
> Key: YARN-4624
> URL: https://issues.apache.org/jira/browse/YARN-4624
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: SchedulerUIWithOutLabelMapping.png, YARN-2674-002.patch, 
> YARN-4624-003.patch, YARN-4624.patch
>
>
> Scenario:
> ===
> Configure nodelables and add to cluster
> Start the cluster
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.PartitionQueueCapacitiesInfo.getMaxAMLimitPercentage(PartitionQueueCapacitiesInfo.java:114)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:105)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:94)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:293)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:447)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5410) Bootstrap Router module

2016-08-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405277#comment-15405277
 ] 

Hadoop QA commented on YARN-5410:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 59s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
12s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 19s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
29s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 15s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
19s {color} | {color:green} YARN-2915 passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server 
hadoop-yarn-project {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 46s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 46s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 13s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 27s 
{color} | {color:red} root: The patch generated 2 new + 0 unchanged - 0 fixed = 
2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 5s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server 
hadoop-yarn-project {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 12s 
{color} | {color:red} hadoop-yarn-server-router in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 9s 
{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 22s {color} 
| {color:red} hadoop-yarn-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 14s 
{color} | {color:green} hadoop-yarn-server-router in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 30s {color} 
| {color:red} hadoop-yarn-project in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 177m 50s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.nodemanager.TestDirectoryCollection |
|   | hadoop.yarn.server.nodemanager.TestDirectoryCollection |
\\
\\
|| Subsystem || 

[jira] [Commented] (YARN-3664) Federation PolicyStore internal APIs

2016-08-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405172#comment-15405172
 ] 

Hadoop QA commented on YARN-3664:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
49s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
54s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
55s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 30s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 20m 56s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12821736/YARN-3664-YARN-2915-v3.patch
 |
| JIRA Issue | YARN-3664 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  cc  |
| uname | Linux d0051a266a11 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-2915 / 22db8fd |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/12621/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/12621/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Federation PolicyStore internal APIs
> 
>
> Key: YARN-3664
> URL: https://issues.apache.org/jira/browse/YARN-3664
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru 

[jira] [Commented] (YARN-5428) Allow for specifying the docker client configuration directory

2016-08-02 Thread Zhankun Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405147#comment-15405147
 ] 

Zhankun Tang commented on YARN-5428:


Yes. Agree. It can store other settings besides credentials. Since the 
credential won't expire (just base64 encoded, not fetched from server) if 
username and password doesn't change, the administrator must store it in 
advance. 

> Allow for specifying the docker client configuration directory
> --
>
> Key: YARN-5428
> URL: https://issues.apache.org/jira/browse/YARN-5428
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
> Attachments: YARN-5428.001.patch, YARN-5428.002.patch, 
> YARN-5428.003.patch, YARN-5428.004.patch
>
>
> The docker client allows for specifying a configuration directory that 
> contains the docker client's configuration. It is common to store "docker 
> login" credentials in this config, to avoid the need to docker login on each 
> cluster member. 
> By default the docker client config is $HOME/.docker/config.json on Linux. 
> However, this does not work with the current container executor user 
> switching and it may also be desirable to centralize this configuration 
> beyond the single user's home directory.
> Note that the command line arg is for the configuration directory NOT the 
> configuration file.
> This change will be needed to allow YARN to automatically pull images at 
> localization time or within container executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3664) Federation PolicyStore internal APIs

2016-08-02 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-3664:
-
Attachment: YARN-3664-YARN-2915-v3.patch

Fixing Yetus warnings (v3).

> Federation PolicyStore internal APIs
> 
>
> Key: YARN-3664
> URL: https://issues.apache.org/jira/browse/YARN-3664
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-3664-YARN-2915-v0.patch, 
> YARN-3664-YARN-2915-v1.patch, YARN-3664-YARN-2915-v2.patch, 
> YARN-3664-YARN-2915-v3.patch
>
>
> The federation Policy Store contains information about the capacity 
> allocations made by users, their mapping to sub-clusters and the policies 
> that each of the components (Router, AMRMPRoxy, RMs) should enforce



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5468) Scheduling of long-running applications

2016-08-02 Thread Konstantinos Karanasos (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405142#comment-15405142
 ] 

Konstantinos Karanasos commented on YARN-5468:
--

Thanks for the comment, [~cheersyang]. Yes, I have read YARN-4902.
It is definitely related, but in this JIRA we are focusing in particular on the 
scheduling of long running jobs/services.
In that sense, YARN-4902 is more general. On the other hand, unlike YARN-4902, 
we will be providing the option of service planning, that is, we will be able 
to look at multiple services at once and plan their execution in a more 
holistic manner than the scheduler can do (given that the scheduler looks at 
one resource request at a time). This can be seen as related to the planning 
phase of YARN-1051.

> Scheduling of long-running applications
> ---
>
> Key: YARN-5468
> URL: https://issues.apache.org/jira/browse/YARN-5468
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacityscheduler, fairscheduler
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-5468.prototype.patch
>
>
> This JIRA is about the scheduling of applications with long-running tasks.
> It will include adding support to the YARN for a richer set of scheduling 
> constraints (such as affinity, anti-affinity, cardinality and time 
> constraints), and extending the schedulers to take them into account during 
> placement of containers to nodes.
> We plan to have both an online version that will accommodate such requests as 
> they arrive, as well as a Long-running Application Planner that will make 
> more global decisions by considering multiple applications at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5410) Bootstrap Router module

2016-08-02 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-5410:
---
Attachment: YARN-5410-YARN-2915-v1.patch

> Bootstrap Router module
> ---
>
> Key: YARN-5410
> URL: https://issues.apache.org/jira/browse/YARN-5410
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-5410-YARN-2915-v1.patch
>
>
> As detailed in the proposal in the umbrella JIRA, we are introducing a new 
> component that routes client request to appropriate ResourceManager(s). This 
> JIRA tracks the creation of a new sub-module for the Router.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5451) Container localizers that hang are not cleaned up

2016-08-02 Thread Brook Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405060#comment-15405060
 ] 

Brook Zhou commented on YARN-5451:
--

Is this because the ContainerLocalizer is launched in a separate process from 
LCE with a timeOutInterval of 0?

> Container localizers that hang are not cleaned up
> -
>
> Key: YARN-5451
> URL: https://issues.apache.org/jira/browse/YARN-5451
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>
> I ran across an old, rogue process on one of our nodes.  It apparently was a 
> container localizer that somehow entered an infinite loop during startup.  
> The NM never cleaned up this broken localizer, so it happily ran forever.  
> The NM needs to do a better job of tracking localizers, including killing 
> them if they appear to be hung/broken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5468) Scheduling of long-running applications

2016-08-02 Thread Panagiotis Garefalakis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405052#comment-15405052
 ] 

Panagiotis Garefalakis edited comment on YARN-5468 at 8/3/16 12:03 AM:
---

Attaching a patch to showcase above proposal. 

In this first patch we are introducing allocation tags and three placement 
constraints: affinity, anti-affinity, cardinality. We are planning to 
consolidate those in a single constraint in the 2nd version of the patch. For 
the time being we do not support time constraints.

In the current version the requests are accommodated in an online, greedy 
fashion.

We extend distribute-shell application Client and AM to demonstrate inter-job 
placement constraints. Some unit-tests are also included to show the supported 
constraints (affinity, anti-affinity, and cardinality) in Node and Rack level.


was (Author: pgaref):
Attaching a patch to showcase above proposal. 

In this first patch we are introducing allocation tags and three placement 
constraints: affinity, anti-affinity, cardinality. We are planning to 
consolidate those in a single constraint in the 2nd version of the patch. For 
the time being we do not support time constraints.

We extend distribute-shell Client and AM to demonstrate affinity inter-job 
constraints. Some unit-tests are also included to show the supported 
constraints (affinity, anti-affinity, and cardinality) in Node and Rack level.

> Scheduling of long-running applications
> ---
>
> Key: YARN-5468
> URL: https://issues.apache.org/jira/browse/YARN-5468
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacityscheduler, fairscheduler
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-5468.prototype.patch
>
>
> This JIRA is about the scheduling of applications with long-running tasks.
> It will include adding support to the YARN for a richer set of scheduling 
> constraints (such as affinity, anti-affinity, cardinality and time 
> constraints), and extending the schedulers to take them into account during 
> placement of containers to nodes.
> We plan to have both an online version that will accommodate such requests as 
> they arrive, as well as a Long-running Application Planner that will make 
> more global decisions by considering multiple applications at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications

2016-08-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405056#comment-15405056
 ] 

Hadoop QA commented on YARN-5382:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 36s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
21s {color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} branch-2.7 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} branch-2.7 passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
28s {color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} branch-2.7 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 8s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 in branch-2.7 has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} branch-2.7 passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} branch-2.7 passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 24s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 34 new + 684 unchanged - 7 fixed = 718 total (was 691) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 3946 line(s) that end in whitespace. Use 
git apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 1m 41s 
{color} | {color:red} The patch 96 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
12s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 17s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_101. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_101
 with JDK v1.7.0_101 generated 3 new + 2 unchanged - 0 fixed = 5 total (was 2) 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 48s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_101. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 23s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_101. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 132m 53s {color} 
| {color:black} {color} 

[jira] [Commented] (YARN-5468) Scheduling of long-running applications

2016-08-02 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405045#comment-15405045
 ] 

Weiwei Yang commented on YARN-5468:
---

Hi [~kkaranasos]

Have you read YARN-4902? It looks like what you are trying to address here has 
some overlap with that one. 

> Scheduling of long-running applications
> ---
>
> Key: YARN-5468
> URL: https://issues.apache.org/jira/browse/YARN-5468
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacityscheduler, fairscheduler
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
>
> This JIRA is about the scheduling of applications with long-running tasks.
> It will include adding support to the YARN for a richer set of scheduling 
> constraints (such as affinity, anti-affinity, cardinality and time 
> constraints), and extending the schedulers to take them into account during 
> placement of containers to nodes.
> We plan to have both an online version that will accommodate such requests as 
> they arrive, as well as a Long-running Application Planner that will make 
> more global decisions by considering multiple applications at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-5468) Scheduling of long-running applications

2016-08-02 Thread Panagiotis Garefalakis (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated YARN-5468:
-
Comment: was deleted

(was: Uploading a first prototype)

> Scheduling of long-running applications
> ---
>
> Key: YARN-5468
> URL: https://issues.apache.org/jira/browse/YARN-5468
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacityscheduler, fairscheduler
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
>
> This JIRA is about the scheduling of applications with long-running tasks.
> It will include adding support to the YARN for a richer set of scheduling 
> constraints (such as affinity, anti-affinity, cardinality and time 
> constraints), and extending the schedulers to take them into account during 
> placement of containers to nodes.
> We plan to have both an online version that will accommodate such requests as 
> they arrive, as well as a Long-running Application Planner that will make 
> more global decisions by considering multiple applications at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5468) Scheduling of long-running applications

2016-08-02 Thread Panagiotis Garefalakis (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated YARN-5468:
-
Attachment: LRS-Constraints-v2.patch

Uploading a first prototype

> Scheduling of long-running applications
> ---
>
> Key: YARN-5468
> URL: https://issues.apache.org/jira/browse/YARN-5468
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacityscheduler, fairscheduler
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: LRS-Constraints-v2.patch
>
>
> This JIRA is about the scheduling of applications with long-running tasks.
> It will include adding support to the YARN for a richer set of scheduling 
> constraints (such as affinity, anti-affinity, cardinality and time 
> constraints), and extending the schedulers to take them into account during 
> placement of containers to nodes.
> We plan to have both an online version that will accommodate such requests as 
> they arrive, as well as a Long-running Application Planner that will make 
> more global decisions by considering multiple applications at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5121) fix some container-executor portability issues

2016-08-02 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-5121:
---
Component/s: security

> fix some container-executor portability issues
> --
>
> Key: YARN-5121
> URL: https://issues.apache.org/jira/browse/YARN-5121
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, security
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Blocker
>  Labels: security
> Fix For: 3.0.0-alpha2
>
> Attachments: YARN-5121.00.patch, YARN-5121.01.patch, 
> YARN-5121.02.patch, YARN-5121.03.patch, YARN-5121.04.patch, 
> YARN-5121.06.patch, YARN-5121.07.patch, YARN-5121.08.patch
>
>
> container-executor has some issues that are preventing it from even compiling 
> on the OS X jenkins instance.  Let's fix those.  While we're there, let's 
> also try to take care of some of the other portability problems that have 
> crept in over the years, since it used to work great on Solaris but now 
> doesn't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3664) Federation PolicyStore internal APIs

2016-08-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405005#comment-15405005
 ] 

Hadoop QA commented on YARN-3664:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
16s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
40s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 10s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common: 
The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 13m 16s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12821710/YARN-3664-YARN-2915-v2.patch
 |
| JIRA Issue | YARN-3664 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  cc  |
| uname | Linux 2f6ebd4a847a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-2915 / 22db8fd |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/12619/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/12619/artifact/patchprocess/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/12619/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/12619/console |
| Powered by | 

[jira] [Updated] (YARN-5121) fix some container-executor portability issues

2016-08-02 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-5121:
---
Labels: security  (was: )

> fix some container-executor portability issues
> --
>
> Key: YARN-5121
> URL: https://issues.apache.org/jira/browse/YARN-5121
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, security
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Blocker
>  Labels: security
> Fix For: 3.0.0-alpha2
>
> Attachments: YARN-5121.00.patch, YARN-5121.01.patch, 
> YARN-5121.02.patch, YARN-5121.03.patch, YARN-5121.04.patch, 
> YARN-5121.06.patch, YARN-5121.07.patch, YARN-5121.08.patch
>
>
> container-executor has some issues that are preventing it from even compiling 
> on the OS X jenkins instance.  Let's fix those.  While we're there, let's 
> also try to take care of some of the other portability problems that have 
> crept in over the years, since it used to work great on Solaris but now 
> doesn't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5456) container-executor support for FreeBSD, NetBSD, and others if conf path is absolute

2016-08-02 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-5456:
---
Component/s: security

> container-executor support for FreeBSD, NetBSD, and others if conf path is 
> absolute
> ---
>
> Key: YARN-5456
> URL: https://issues.apache.org/jira/browse/YARN-5456
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, security
>Affects Versions: 3.0.0-alpha2
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>  Labels: security
> Attachments: YARN-5456.00.patch, YARN-5456.01.patch
>
>
> YARN-5121 fixed quite a few portability issues, but it also changed how it 
> determines it's location to be very operating specific for security reasons.  
> We should add support for FreeBSD to unbreak it's ports entry, NetBSD (the 
> sysctl options are just in a different order), and for operating systems that 
> do not have a defined method, an escape hatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5456) container-executor support for FreeBSD, NetBSD, and others if conf path is absolute

2016-08-02 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-5456:
---
Labels: security  (was: )

> container-executor support for FreeBSD, NetBSD, and others if conf path is 
> absolute
> ---
>
> Key: YARN-5456
> URL: https://issues.apache.org/jira/browse/YARN-5456
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, security
>Affects Versions: 3.0.0-alpha2
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>  Labels: security
> Attachments: YARN-5456.00.patch, YARN-5456.01.patch
>
>
> YARN-5121 fixed quite a few portability issues, but it also changed how it 
> determines it's location to be very operating specific for security reasons.  
> We should add support for FreeBSD to unbreak it's ports entry, NetBSD (the 
> sysctl options are just in a different order), and for operating systems that 
> do not have a defined method, an escape hatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5468) Scheduling of long-running applications

2016-08-02 Thread Konstantinos Karanasos (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15405002#comment-15405002
 ] 

Konstantinos Karanasos commented on YARN-5468:
--

We will shortly upload a first prototype just to get some initial feedback. In 
this first patch we are introducing the placement constraints and extend the 
CapacityScheduler to take them into account during scheduling.
We will soon upload a design document too.

> Scheduling of long-running applications
> ---
>
> Key: YARN-5468
> URL: https://issues.apache.org/jira/browse/YARN-5468
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacityscheduler, fairscheduler
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
>
> This JIRA is about the scheduling of applications with long-running tasks.
> It will include adding support to the YARN for a richer set of scheduling 
> constraints (such as affinity, anti-affinity, cardinality and time 
> constraints), and extending the schedulers to take them into account during 
> placement of containers to nodes.
> We plan to have both an online version that will accommodate such requests as 
> they arrive, as well as a Long-running Application Planner that will make 
> more global decisions by considering multiple applications at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5456) container-executor support for FreeBSD, NetBSD, and others if conf path is absolute

2016-08-02 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404998#comment-15404998
 ] 

Allen Wittenauer commented on YARN-5456:


I have a Kerberized Ubuntu/x86 VM that I generally use for testing things.  
Popped trunk+this patch onto it.  Looks like things are working the way they 
are supposed to.

Ran a simple sleep streaming job and ended up with the following dirs in the 
nm-local-dir:
{code}
root@ku:/tmp/hadoop-yarn/nm-local-dir# find . -user aw -type d -ls
  93674 drwxr-s---   4 aw   yarn 4096 Aug  2 16:08 
./usercache/aw
  93684 drwxr-s---   3 aw   yarn 4096 Aug  2 16:08 
./usercache/aw/appcache
  93704 drwxr-s---   7 aw   yarn 4096 Aug  2 16:08 
./usercache/aw/appcache/application_1470179247859_0001
{code}

after job finished, directories disappeared as expected.

> container-executor support for FreeBSD, NetBSD, and others if conf path is 
> absolute
> ---
>
> Key: YARN-5456
> URL: https://issues.apache.org/jira/browse/YARN-5456
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: YARN-5456.00.patch, YARN-5456.01.patch
>
>
> YARN-5121 fixed quite a few portability issues, but it also changed how it 
> determines it's location to be very operating specific for security reasons.  
> We should add support for FreeBSD to unbreak it's ports entry, NetBSD (the 
> sysctl options are just in a different order), and for operating systems that 
> do not have a defined method, an escape hatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5468) Scheduling of long-running applications

2016-08-02 Thread Konstantinos Karanasos (JIRA)
Konstantinos Karanasos created YARN-5468:


 Summary: Scheduling of long-running applications
 Key: YARN-5468
 URL: https://issues.apache.org/jira/browse/YARN-5468
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: capacityscheduler, fairscheduler
Reporter: Konstantinos Karanasos
Assignee: Konstantinos Karanasos


This JIRA is about the scheduling of applications with long-running tasks.
It will include adding support to the YARN for a richer set of scheduling 
constraints (such as affinity, anti-affinity, cardinality and time 
constraints), and extending the schedulers to take them into account during 
placement of containers to nodes.
We plan to have both an online version that will accommodate such requests as 
they arrive, as well as a Long-running Application Planner that will make more 
global decisions by considering multiple applications at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5444) Fix failing unit tests in TestLinuxContainerExecutorWithMocks

2016-08-02 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404981#comment-15404981
 ] 

Yufei Gu commented on YARN-5444:


Thanks a lot for the review and committing, [~vvasudev].

> Fix failing unit tests in TestLinuxContainerExecutorWithMocks
> -
>
> Key: YARN-5444
> URL: https://issues.apache.org/jira/browse/YARN-5444
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.9.0
>
> Attachments: YARN-5444.001.patch
>
>
> Test case {{testLaunchCommandWithoutPriority}} and {{testStartLocalizer}} are 
> based on the assumption that Yarn configuration files won't be loaded, which 
> is not true in some situations. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3664) Federation PolicyStore internal APIs

2016-08-02 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-3664:
-
Attachment: YARN-3664-YARN-2915-v2.patch

[~vvasudev], thanks for your feedback.

Attaching patch (v2) that incorporates your comments. Similar to YARN-5307, the 
names is definitely along the lines but are not exactly what you suggested as I 
have to tried to align with the final version of YARN-3662 which includes 
[~vinodkv]/[~leftnoteasy]'s feedback too.

bq. Can we use something other than ByteBuffer for getParams - this'll become a 
problem if you ever expose this information via REST API or wish to update the 
object via a REST API(marshalling/unmarshalling ByteBuffer can be painful)

I agree with your observation but we couldn't think of a better alternative 
based on the current understanding of the policy space (refer: 
YARN-5324/YARN-5325). Also since we have established this is an internal API, I 
feel we can revisit once the dust settles on the policies post testing. So I 
have left it as ByteBuffer for now.


> Federation PolicyStore internal APIs
> 
>
> Key: YARN-3664
> URL: https://issues.apache.org/jira/browse/YARN-3664
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-3664-YARN-2915-v0.patch, 
> YARN-3664-YARN-2915-v1.patch, YARN-3664-YARN-2915-v2.patch
>
>
> The federation Policy Store contains information about the capacity 
> allocations made by users, their mapping to sub-clusters and the policies 
> that each of the components (Router, AMRMPRoxy, RMs) should enforce



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5327) API changes required to support recurring reservations in the YARN ReservationSystem

2016-08-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404966#comment-15404966
 ] 

Hadoop QA commented on YARN-5327:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
7s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 41s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
43s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
37s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 29s {color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 29m 30s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.logaggregation.TestAggregatedLogFormat |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12821701/YARN-5327.003.patch |
| JIRA Issue | YARN-5327 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  cc  |
| uname | Linux e19297eedbf8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / d28c2d9 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/12618/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt
 |
| unit test logs |  

[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

2016-08-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404963#comment-15404963
 ] 

Junping Du commented on YARN-4676:
--

bq. If NM crashes (for example, JVM exit due to out of heap), it suppose to 
restart automatically, instead of waiting fur human to start it. Isn't that the 
general practice? 
I don't think this is a general case as YARN deployment cases could be various 
- in many cases (especially at on-premise environment), NM is not supposed to 
be so fragile and admin need to figure out what's happening before NM crash. 
Also, even we want to make NM get restart immediately (without human 
assistant/trouble-shoot), the auto restart logic is outside of YARN but belongs 
to some cluster deployment/monitor tools like Ambari. Here, we'd better not to 
have many assumptions.

bq. But nothing prevent/disallow the NM daemon from restart, wither 
automatically or by human. When such NM restart, it will try to register itself 
to RM, which will be told to shutdown if it still appear in the exclude list. 
Such node will remain as DECOMMISSIONED inside RM until 10+ minutes later into 
LOST after the EXPIRE event.
As I said above, this belongs to admin's behavior or your monitor tools logic. 
Just like if an admin is madly to keep starting a NM which belongs to 
decommissioned node, YARN can do nothing about it but just keep shutdown NM. 
Such node should always keep as DECOMMISSIONED and I don't see any benefit to 
move it to EXPIRE status.

bq. Such DECOMMISSIONED node can be recommissioned (refreshNodes after it is 
removed from the exclude list). During which it is transition into RUNNING 
state.
I don't see this hack can bring any benefit, comparing with refreshNode with 
moving it to include list and restart the NM deamon which will go through 
normal register process. The risk is we need to take care a separated code path 
that is dedicated for this minor case.

bq. These behavior appears to me as robust instead of hacking. It appears that 
the behavior you expected relies on a separate mechanism that permanently 
shutdown NM once it is DECOMMISSIONED.
I never hear we need a separate mechanism to shutdown NM once it is 
decommissioned. It should be built-in behavior for Apache Hadoop YARN so far. 
Are you talking about a private/specific branch rather than current 
trunk/branch-2?

bq. As long as such DECOMMISSIONED node never try to register or be 
recommissioned, yes, I expect these transitions you listed could be removed.
The re-register of node after taking refreshNode operation is going through the 
normal register process which is good enough for me. I don't think we need some 
change here unless we have strong reasons. So. Yes. Please remove these 
transitions because this is not correct based on current YARN's logic.

bq. So I see these transitions are really needed. That said, I could removed 
them and maintain them privately inside EMR branch for the sake of getting this 
JIRA going.
I can understand the pain point to maintain a private branch - may be standing 
at your private (EMR) branch, these pieces of code could be needed. However, as 
a community contributor, you have to switch your roles to stand at community 
code base in trunk/branch-2, and we committers can only help to get in pieces 
of code that benefit the whole community. If these piece of code can be 
important for another story (like resource elasticity of YARN) to benefit the 
community, we can move it out to another dedicated work but we need to have 
open discussion on design/implementation ahead - that's the right process for 
patch/feature contribution.

bq. These transitions are there almost single the beginning of this JIRA, any 
other comments/surprises?
These issues already make me surprised enough - these transitions in RMNode 
belongs to very key logic to YARN, and we need to be careful as always. I need 
more time to review the rest of code. Hopefully, I can finish my 1st round 
tomorrow and publish the left comments.

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> 
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Zhi
>Assignee: Daniel Zhi
>  Labels: features
> Attachments: GracefulDecommissionYarnNode.pdf, 
> GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, 
> YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, 
> YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, 
> YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, 
> YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, 
> YARN-4676.018.patch, 

[jira] [Commented] (YARN-5406) In-memory based implementation of the FederationMembershipStateStore

2016-08-02 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404941#comment-15404941
 ] 

Subru Krishnan commented on YARN-5406:
--

Thanks [~ellenfkh] for the patch and [~jianhe] for the review.

I have a few minor comments:
  * I agree with [~jianhe] that the _impl_ package should be a sub-package of 
the _store_ package.
  * Rename {{FederationInMemoryMembershipStateStore}} --> 
{{MemoryFederationStateStore}} and the corresponding test.
  * We need to validate the inputs (like null checks). Since this is common 
across different store implementations, I have created YARN-5467 to track this.
  * All the tests are for positive cases, can we add a few for negative cases.
  * I think we should add a _isSubClusterActive_ method to {{SubClusterState}} 
and use it.
  * Can you update the Javadoc for _FilterInactiveSubClusters_ as requested by 
[~jianhe].

> In-memory based implementation of the FederationMembershipStateStore
> 
>
> Key: YARN-5406
> URL: https://issues.apache.org/jira/browse/YARN-5406
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Ellen Hui
> Attachments: YARN-5406-YARN-2915.v0.patch
>
>
> YARN-3662 defines the FederationMembershipStateStore API. This JIRA tracks an 
> in-memory based implementation which is useful for both single-box testing 
> and for future unit tests that depend on the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5467) InputValidator for the FederationStateStore internal APIs

2016-08-02 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-5467:


 Summary: InputValidator for the FederationStateStore internal APIs
 Key: YARN-5467
 URL: https://issues.apache.org/jira/browse/YARN-5467
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Subru Krishnan
Assignee: Giovanni Matteo Fumarola


We need to check for mandatory fields, well formed-ness (for address fields) 
etc of input params to FederationStateStore. This is common across all Store 
implementations and can be used as a _fail-fast_ mechanism on the client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5390) Federation Subcluster Resolver

2016-08-02 Thread Ellen Hui (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403189#comment-15403189
 ] 

Ellen Hui edited comment on YARN-5390 at 8/2/16 10:38 PM:
--

Hi [~leftnoteasy], thanks for the quick feedback!

* This interface will be used in the three patches you looked at, although you 
are correct that they have not been updated yet. For instance, the 
LocalityMulticastAMRMProxyFederationPolicy prototype in YARN-5325 uses the 
FederationSubClusterResolver interface to split resource requests. There are 
some examples of the resolver being used in the splitResourceRequests method of 
that class, although some of the classnames are out of date. From the javadoc:

{panel}
host localized ResourceRequest are always forwarded to the RM
that owns the node, based on the feedback of a FederationSubClusterResolver

rack localized ResourceRequest are forwarded to the RM that owns
the rack (if the FederationSubClusterResolver provides this info) or
they are forwarded as if they were ANY (this is important for deployment that
stripe racks across sub-clusters) as there is not a single resolution.

ANY request corresponding to node/rack local requests are only forwarded
to the set of RMs that owns the node-local requests. The number of containers
listed in each ANY is proportional to the number of node-local container
requests (associated to this ANY via the same allocateRequestId) 
{panel}

* The FederationInterceptor from YARN-5325 will be responsible for managing the 
lifecyle of the SubClusterResolver.

* I think it's better to leave the SubClusterResolver methods non-static, since 
we want to allow the implementation to be pluggable and I can't think of a 
particular reason it should be static. Please let me know if you disagree, I 
may be missing something.

Thanks!


was (Author: ellenfkh):
Hi [~wangda], thanks for the quick feedback!

This interface will be used in the three patches you looked at, although you 
are correct that they have not been updated yet. For instance, the 
LocalityMulticastAMRMProxyFederationPolicy prototype in YARN-5325 uses the 
FederationSubClusterResolver interface to split resource requests. From the 
javadoc:

host localized ResourceRequest are always forwarded to the RM
that owns the node, based on the feedback of a FederationSubClusterResolver

rack localized ResourceRequest are forwarded to the RM that owns
the rack (if the FederationSubClusterResolver provides this info) or
they are forwarded as if they were ANY (this is important for deployment that
stripe racks across sub-clusters) as there is not a single resolution.

ANY request corresponding to node/rack local requests are only forwarded
to the set of RMs that owns the node-local requests. The number of containers
listed in each ANY is proportional to the number of node-local container
requests (associated to this ANY via the same allocateRequestId) 

There are some examples of the resolver being used in the splitResourceRequests 
method of the same class, although some of the classnames are out of date.


The FederationInterceptor from YARN-5325 will be responsible for managing the 
lifecyle of the SubClusterResolver.

I think it's better to leave the SubClusterResolver methods non-static, since 
we want to allow the implementation to be pluggable and I can't think of a 
particular reason it should be static. Please let me know if you disagree, I 
may be missing something.

Thanks!

> Federation Subcluster Resolver
> --
>
> Key: YARN-5390
> URL: https://issues.apache.org/jira/browse/YARN-5390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Carlo Curino
>Assignee: Ellen Hui
> Attachments: YARN-5390-YARN-2915.v0.patch, 
> YARN-5390-YARN-2915.v1.patch, YARN-5390-YARN-2915.v2.patch
>
>
> This JIRA tracks effort to create a mechanism to resolve nodes/racks resource 
> names to sub-cluster identifiers. This is needed by the federation policies 
> in YARN-5323, YARN-5324, YARN-5325 to operate correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5327) API changes required to support recurring reservations in the YARN ReservationSystem

2016-08-02 Thread Sangeetha Abdu Jyothi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangeetha Abdu Jyothi updated YARN-5327:

Attachment: YARN-5327.003.patch

> API changes required to support recurring reservations in the YARN 
> ReservationSystem
> 
>
> Key: YARN-5327
> URL: https://issues.apache.org/jira/browse/YARN-5327
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Subru Krishnan
>Assignee: Sangeetha Abdu Jyothi
> Attachments: YARN-5327.001.patch, YARN-5327.002.patch, 
> YARN-5327.003.patch
>
>
> YARN-5326 proposes adding native support for recurring reservations in the 
> YARN ReservationSystem. This JIRA is a sub-task to track the changes needed 
> in ApplicationClientProtocol to accomplish it. Please refer to the design doc 
> in the parent JIRA for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5456) container-executor support for FreeBSD, NetBSD, and others if conf path is absolute

2016-08-02 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404892#comment-15404892
 ] 

Chris Nauroth commented on YARN-5456:
-

[~aw], patch 01 looks good.  I verified this on OS X, Linux and FreeBSD.  It's 
cool to see the test passing on FreeBSD this time around!  My only other 
suggestion is to try deploying this change in a secured cluster for a bit of 
manual testing before we commit.

> container-executor support for FreeBSD, NetBSD, and others if conf path is 
> absolute
> ---
>
> Key: YARN-5456
> URL: https://issues.apache.org/jira/browse/YARN-5456
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
> Attachments: YARN-5456.00.patch, YARN-5456.01.patch
>
>
> YARN-5121 fixed quite a few portability issues, but it also changed how it 
> determines it's location to be very operating specific for security reasons.  
> We should add support for FreeBSD to unbreak it's ports entry, NetBSD (the 
> sysctl options are just in a different order), and for operating systems that 
> do not have a defined method, an escape hatch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5382) RM does not audit log kill request for active applications

2016-08-02 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404788#comment-15404788
 ] 

Vrushali C edited comment on YARN-5382 at 8/2/16 9:08 PM:
--

So with the last uploaded patch v9 on branch-2.7 
(https://issues.apache.org/jira/secure/attachment/12821498/YARN-5382-branch-2.7.09.patch)
 . The 2.7 patch does not have logging in ClientRMService. 

I see only one audit log message when I ran a sleep job and killed it on 
pseudo-distributed setup on my laptop.

{code}
[machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ grep -i 
Rmauditlogg logs/yarn-vchannapattan-resourcemanager-machine13-channapattan.log  
| grep -i Kill
2016-08-02 14:00:19,186 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=vchannapattan 
   IP=127.0.0.1OPERATION=Kill Application Request  TARGET=RMAppImpl 
   RESULT=SUCCESS  APPID=application_1470171585834_0001
2016-08-02 14:00:19,195 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=vchannapattan 
   OPERATION=Application Finished - Killed TARGET=RMAppManager 
RESULT=SUCCES   APPID=application_1470171585834_0001
[machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$
{code}


On another window:
{code}
[machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ bin/hadoop jar 
share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.4-SNAPSHOT.jar 
sleep -m 100 -r 1000 -mt 300 -rt 300
16/08/02 14:00:03 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
16/08/02 14:00:03 INFO client.RMProxy: Connecting to ResourceManager at 
/0.0.0.0:8032
16/08/02 14:00:05 INFO mapreduce.JobSubmitter: number of splits:100
16/08/02 14:00:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
job_1470171585834_0001
16/08/02 14:00:05 INFO impl.YarnClientImpl: Submitted application 
application_1470171585834_0001
16/08/02 14:00:05 INFO mapreduce.Job: The url to track the job: 
http://localhost:8088/proxy/application_1470171585834_0001/
16/08/02 14:00:05 INFO mapreduce.Job: Running job: job_1470171585834_0001
^C
[machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ bin/yarn 
application -kill  application_1470171585834_0001
16/08/02 14:00:16 INFO client.RMProxy: Connecting to ResourceManager at 
/0.0.0.0:8032
16/08/02 14:00:17 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
Killing application application_1470171585834_0001
16/08/02 14:00:19 INFO impl.YarnClientImpl: Killed application 
application_1470171585834_0001
[t-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$
{code}

I need to update the patch for trunk to include removal of the audit logging 
upon isAppFinalStateStored check. 


was (Author: vrushalic):
So with the last uploaded patch v9 on branch-2.7 
(https://issues.apache.org/jira/secure/attachment/12821498/YARN-5382-branch-2.7.09.patch)
 . The 2.7 patch does not have logging in ClientRMService. 

I see only one audit log message when I ran a sleep job and killed it on 
pseudo-distributed setup on my laptop.

{code}
[machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ grep -i 
Rmauditlogg logs/yarn-vchannapattan-resourcemanager-machine13-channapattan.log  
| grep -i Kill
2016-08-02 14:00:19,186 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=vchannapattan 
   IP=127.0.0.1OPERATION=Kill Application Request  TARGET=RMAppImpl 
   RESULT=SUCCESS  APPID=application_1470171585834_0001
2016-08-02 14:00:19,195 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=vchannapattan 
   OPERATION=Application Finished - Killed TARGET=RMAppManager 
RESULT=SUCCES   APPID=application_1470171585834_0001
[machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$
{code}


On another window:
[machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ bin/hadoop jar 
share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.4-SNAPSHOT.jar 
sleep -m 100 -r 1000 -mt 300 -rt 300
16/08/02 14:00:03 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
16/08/02 14:00:03 INFO client.RMProxy: Connecting to ResourceManager at 
/0.0.0.0:8032
16/08/02 14:00:05 INFO mapreduce.JobSubmitter: number of splits:100
16/08/02 14:00:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
job_1470171585834_0001
16/08/02 14:00:05 INFO impl.YarnClientImpl: Submitted application 
application_1470171585834_0001
16/08/02 14:00:05 INFO mapreduce.Job: The url to track the job: 
http://localhost:8088/proxy/application_1470171585834_0001/
16/08/02 14:00:05 INFO mapreduce.Job: Running job: job_1470171585834_0001
^C
[machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ bin/yarn 
application -kill  

[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications

2016-08-02 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404788#comment-15404788
 ] 

Vrushali C commented on YARN-5382:
--

So with the last uploaded patch v9 on branch-2.7 
(https://issues.apache.org/jira/secure/attachment/12821498/YARN-5382-branch-2.7.09.patch)
 . The 2.7 patch does not have logging in ClientRMService. 

I see only one audit log message when I ran a sleep job and killed it on 
pseudo-distributed setup on my laptop.

{code}
[machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ grep -i 
Rmauditlogg logs/yarn-vchannapattan-resourcemanager-machine13-channapattan.log  
| grep -i Kill
2016-08-02 14:00:19,186 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=vchannapattan 
   IP=127.0.0.1OPERATION=Kill Application Request  TARGET=RMAppImpl 
   RESULT=SUCCESS  APPID=application_1470171585834_0001
2016-08-02 14:00:19,195 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=vchannapattan 
   OPERATION=Application Finished - Killed TARGET=RMAppManager 
RESULT=SUCCES   APPID=application_1470171585834_0001
[machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$
{code}


On another window:
[machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ bin/hadoop jar 
share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.4-SNAPSHOT.jar 
sleep -m 100 -r 1000 -mt 300 -rt 300
16/08/02 14:00:03 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
16/08/02 14:00:03 INFO client.RMProxy: Connecting to ResourceManager at 
/0.0.0.0:8032
16/08/02 14:00:05 INFO mapreduce.JobSubmitter: number of splits:100
16/08/02 14:00:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
job_1470171585834_0001
16/08/02 14:00:05 INFO impl.YarnClientImpl: Submitted application 
application_1470171585834_0001
16/08/02 14:00:05 INFO mapreduce.Job: The url to track the job: 
http://localhost:8088/proxy/application_1470171585834_0001/
16/08/02 14:00:05 INFO mapreduce.Job: Running job: job_1470171585834_0001
^C
[machine13-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$ bin/yarn 
application -kill  application_1470171585834_0001
16/08/02 14:00:16 INFO client.RMProxy: Connecting to ResourceManager at 
/0.0.0.0:8032
16/08/02 14:00:17 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
Killing application application_1470171585834_0001
16/08/02 14:00:19 INFO impl.YarnClientImpl: Killed application 
application_1470171585834_0001
[t-channapattan hadoop-2.7.4-SNAPSHOT (branch-2.7)]$
{code}

I need to update the patch for trunk to include removal of the audit logging 
upon isAppFinalStateStored check. 

> RM does not audit log kill request for active applications
> --
>
> Key: YARN-5382
> URL: https://issues.apache.org/jira/browse/YARN-5382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
>Assignee: Vrushali C
> Attachments: YARN-5382-branch-2.7.01.patch, 
> YARN-5382-branch-2.7.02.patch, YARN-5382-branch-2.7.03.patch, 
> YARN-5382-branch-2.7.04.patch, YARN-5382-branch-2.7.05.patch, 
> YARN-5382-branch-2.7.09.patch, YARN-5382.06.patch, YARN-5382.07.patch, 
> YARN-5382.08.patch, YARN-5382.09.patch
>
>
> ClientRMService will audit a kill request but only if it either fails to 
> issue the kill or if the kill is sent to an already finished application.  It 
> does not create a log entry when the application is active which is arguably 
> the most important case to audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5307) Federation Application State Store internal APIs

2016-08-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404774#comment-15404774
 ] 

Hadoop QA commented on YARN-5307:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
7s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
38s {color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} YARN-2915 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 13m 2s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12821681/YARN-5307-YARN-2915-v4.patch
 |
| JIRA Issue | YARN-5307 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  cc  |
| uname | Linux 857cc5bf8d72 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-2915 / 22db8fd |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/12616/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/12616/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Federation Application State Store internal APIs
> 
>
> Key: YARN-5307
> URL: https://issues.apache.org/jira/browse/YARN-5307
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>

[jira] [Updated] (YARN-5307) Federation Application State Store internal APIs

2016-08-02 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-5307:
-
Attachment: YARN-5307-YARN-2915-v4.patch

Updated patch (v4) with minor typo fixes to private methods

> Federation Application State Store internal APIs
> 
>
> Key: YARN-5307
> URL: https://issues.apache.org/jira/browse/YARN-5307
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-5307-YARN-2915-v1.patch, 
> YARN-5307-YARN-2915-v2.patch, YARN-5307-YARN-2915-v3.patch, 
> YARN-5307-YARN-2915-v4.patch
>
>
> The Federation Application State encapsulates the mapping between an 
> application and it's _home_ sub-cluster, i.e. the sub-cluster to which it is 
> submitted to by the Router. Please refer to the design doc in parent JIRA for 
> further details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications

2016-08-02 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404694#comment-15404694
 ] 

Jason Lowe commented on YARN-5382:
--

If we keep the kill success logging in both a transition and in ClientRMService 
then we'll get two audit logs instead of one.

I also don't think it's as simple as removing the one from 
KillAttemptTransition since then we won't get a log if the RM fails over just 
as it saved the killed state of an app but before it executed the 
AppKilledTransition.  IMHO we need to log it once, before we enter the 
FINAL_SAVING state to record the killed transition.  Then we might get two 
audit logs during a failover (one on each RM instance) but that's far 
preferable to none.


> RM does not audit log kill request for active applications
> --
>
> Key: YARN-5382
> URL: https://issues.apache.org/jira/browse/YARN-5382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
>Assignee: Vrushali C
> Attachments: YARN-5382-branch-2.7.01.patch, 
> YARN-5382-branch-2.7.02.patch, YARN-5382-branch-2.7.03.patch, 
> YARN-5382-branch-2.7.04.patch, YARN-5382-branch-2.7.05.patch, 
> YARN-5382-branch-2.7.09.patch, YARN-5382.06.patch, YARN-5382.07.patch, 
> YARN-5382.08.patch, YARN-5382.09.patch
>
>
> ClientRMService will audit a kill request but only if it either fails to 
> issue the kill or if the kill is sent to an already finished application.  It 
> does not create a log entry when the application is active which is arguably 
> the most important case to audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

2016-08-02 Thread Daniel Zhi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404645#comment-15404645
 ] 

Daniel Zhi commented on YARN-4676:
--

If NM crashes (for example, JVM exit due to out of heap), it suppose to restart 
automatically, instead of waiting fur human to start it. Isn't that the general 
practice? NM code, upon receive shutdown from RM, will exit self. But nothing 
prevent/disallow the NM daemon from restart, wither automatically or by human. 
When such NM restart, it will try to register itself to RM, which will be told 
to shutdown if it still appear in the exclude list. Such node will remain as 
DECOMMISSIONED inside RM until 10+ minutes later into LOST after the EXPIRE 
event.

Such DECOMMISSIONED node can be recommissioned (refreshNodes after it is 
removed from the exclude list). During which it is transition into RUNNING 
state.

These behavior appears to me as robust instead of hacking. It appears that the 
behavior you expected relies on a separate mechanism that permanently shutdown 
NM once it is DECOMMISSIONED. As long as such DECOMMISSIONED node never try to 
register or be recommissioned, yes, I expect these transitions you listed could 
be removed.

So I see these transitions are really needed. That said, I could removed them 
and maintain them privately inside EMR branch for the sake of getting this JIRA 
going.

These transitions are there almost single the beginning of this JIRA, any other 
comments/surprises?





> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> 
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Zhi
>Assignee: Daniel Zhi
>  Labels: features
> Attachments: GracefulDecommissionYarnNode.pdf, 
> GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, 
> YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, 
> YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, 
> YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, 
> YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, 
> YARN-4676.018.patch, YARN-4676.019.patch
>
>
> YARN-4676 implements an automatic, asynchronous and flexible mechanism to 
> graceful decommission
> YARN nodes. After user issues the refreshNodes request, ResourceManager 
> automatically evaluates
> status of all affected nodes to kicks out decommission or recommission 
> actions. RM asynchronously
> tracks container and application status related to DECOMMISSIONING nodes to 
> decommission the
> nodes immediately after there are ready to be decommissioned. Decommissioning 
> timeout at individual
> nodes granularity is supported and could be dynamically updated. The 
> mechanism naturally supports multiple
> independent graceful decommissioning “sessions” where each one involves 
> different sets of nodes with
> different timeout settings. Such support is ideal and necessary for graceful 
> decommission request issued
> by external cluster management software instead of human.
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4888) Changes in RM container allocation for identifying resource-requests explicitly

2016-08-02 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404644#comment-15404644
 ] 

Subru Krishnan commented on YARN-4888:
--

The checkstyle issue is to do with more than 7 parameters and the test case 
failure is unrelated.

> Changes in RM container allocation for identifying resource-requests 
> explicitly
> ---
>
> Key: YARN-4888
> URL: https://issues.apache.org/jira/browse/YARN-4888
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: YARN-4888-WIP.patch, YARN-4888-v0.patch, 
> YARN-4888-v2.patch, YARN-4888-v3.patch, YARN-4888-v4.patch, 
> YARN-4888-v5.patch, YARN-4888-v6.patch, YARN-4888.001.patch
>
>
> YARN-4879 puts forward the notion of identifying allocate requests 
> explicitly. This JIRA is to track the changes in RM app scheduling data 
> structures to accomplish it. Please refer to the design doc in the parent 
> JIRA for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications

2016-08-02 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404634#comment-15404634
 ] 

Vrushali C commented on YARN-5382:
--

Thanks [~jianhe] and [~jlowe].  
Apologies, I somehow missed that the success logging in ClientRMService on the 
isAppFinalStateStored snuck back in. I think that happened when I rebased to 
latest during one of the patches. Will remove it now.

[~jianhe], Would it be then okay to keep the logging in 
RMAppImpl#AppKilledTransition as well as ClientRMService? Will remove the one 
in RMAppImpl#KillAttemptTransition. 

> RM does not audit log kill request for active applications
> --
>
> Key: YARN-5382
> URL: https://issues.apache.org/jira/browse/YARN-5382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
>Assignee: Vrushali C
> Attachments: YARN-5382-branch-2.7.01.patch, 
> YARN-5382-branch-2.7.02.patch, YARN-5382-branch-2.7.03.patch, 
> YARN-5382-branch-2.7.04.patch, YARN-5382-branch-2.7.05.patch, 
> YARN-5382-branch-2.7.09.patch, YARN-5382.06.patch, YARN-5382.07.patch, 
> YARN-5382.08.patch, YARN-5382.09.patch
>
>
> ClientRMService will audit a kill request but only if it either fails to 
> issue the kill or if the kill is sent to an already finished application.  It 
> does not create a log entry when the application is active which is arguably 
> the most important case to audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5466) DefaultContainerExecutor needs JavaDocs

2016-08-02 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-5466:
---
Attachment: YARN-5466.001.patch

This patch adds JavaDocs and does some basic cleanup.  I'd love some 
confirmation that my interpretations of the methods are all accurate.

> DefaultContainerExecutor needs JavaDocs
> ---
>
> Key: YARN-5466
> URL: https://issues.apache.org/jira/browse/YARN-5466
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Minor
> Attachments: YARN-5466.001.patch
>
>
> Following on YARN-5455, let's document the DefaultContainerExecutor as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

2016-08-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404555#comment-15404555
 ] 

Junping Du commented on YARN-4676:
--

Thanks for sharing these details, Daniel. 
bq. In the typical EMR cluster scenario, daemon like NM will be configured to 
auto-start if killed/shutdown, however RM will reject such NM if it appear in 
the exclude list.
In today's YARN (community version), if RM reject NM's register request, NM 
should get terminated directly. I think we should follow existing behavior or 
it could be incompatible issues there. 

bq. 1, DECOMMISSIONED NM, will try to register to RM but will be rejected. It 
continue such loop until either: 1) the host being terminated; 2) the host 
being recommissioned. It was likely the DECOMMISSIONED->LOST transition is 
defensive coding — without it invalid event throws.
I can understand we want to gain the scale in and out capability here for 
cluster's elasticity. However, I am not sure how much benefit we can gain from 
this hacking behavior - it sounds like we just saving NM daemon start time 
which is several seconds in most cases which is trivial comparing with 
container launching ad running. Do I miss other benefit here?

bq. It was likely the DECOMMISSIONED->LOST transition is defensive coding — 
without it invalid event throws.
As I mentioned above, we should remove watching DECOMMISSIONED node which is 
unnecessary to consume RM resource to take care of it. If EXPIRE event get 
throw in your case, then we should check something wrong there (like 
race-condition, etc.) and fix there.

bq. CLEANUP_CONTAINER and CLEANUP_APP were for sure added to prevent otherwise 
invalid event exception at the DECOMMISSIONED state
I can understand we want to get rid of any annoy invalid transition in our 
logs. However, similar to what I mentioned above, we need to find out where we 
send these events and check if these case are valid or belongs to bug due to 
race condition, etc. Even if we really sure some of events are hard to get rid 
of, we should empty the transition here as any logic in transition is not 
necessary. 

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> 
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Zhi
>Assignee: Daniel Zhi
>  Labels: features
> Attachments: GracefulDecommissionYarnNode.pdf, 
> GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, 
> YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, 
> YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, 
> YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, 
> YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, 
> YARN-4676.018.patch, YARN-4676.019.patch
>
>
> YARN-4676 implements an automatic, asynchronous and flexible mechanism to 
> graceful decommission
> YARN nodes. After user issues the refreshNodes request, ResourceManager 
> automatically evaluates
> status of all affected nodes to kicks out decommission or recommission 
> actions. RM asynchronously
> tracks container and application status related to DECOMMISSIONING nodes to 
> decommission the
> nodes immediately after there are ready to be decommissioned. Decommissioning 
> timeout at individual
> nodes granularity is supported and could be dynamically updated. The 
> mechanism naturally supports multiple
> independent graceful decommissioning “sessions” where each one involves 
> different sets of nodes with
> different timeout settings. Such support is ideal and necessary for graceful 
> decommission request issued
> by external cluster management software instead of human.
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5226) remove AHS enable check from LogsCLI#fetchAMContainerLogs

2016-08-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404483#comment-15404483
 ] 

Hudson commented on YARN-5226:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #10194 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10194/])
YARN-5226. Remove AHS enable check from LogsCLI#fetchAMContainerLogs. 
(junping_du: rev 3818393297c7b337e380e8111a55f2ac4745cb83)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/LogsCLI.java


> remove AHS enable check from LogsCLI#fetchAMContainerLogs
> -
>
> Key: YARN-5226
> URL: https://issues.apache.org/jira/browse/YARN-5226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.9.0
>
> Attachments: YARN-5226.1.patch, YARN-5226.2.patch, YARN-5226.3.patch, 
> YARN-5226.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

2016-08-02 Thread Daniel Zhi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404463#comment-15404463
 ] 

Daniel Zhi commented on YARN-4676:
--

I can clarify the scenarios:
1. DECOMMISSIONED->RUNNING this happens due to the RECOMMISSION event which is 
triggered when the node is removed from exclude file (node can be dynamically 
excluded or included). In the typical EMR cluster scenario, daemon like NM will 
be configured to auto-start if killed/shutdown, however RM will reject such NM 
if it appear in the exclude list.  
2. Related to 1, DECOMMISSIONED NM, upon auto-restart, will try to register to 
RM but will be rejected. It continue such loop until either: 1) the host being 
terminated; 2) the host being recommissioned. It was likely the 
DECOMMISSIONED->LOST transition is defensive coding --- without it invalid 
event throws
3. CLEANUP_CONTAINER and CLEANUP_APP were for sure added to prevent otherwise 
invalid event exception at the DECOMMISSIONED state.

So the core reason related to these transitions are related to DECOMMISSIONED 
NMs are "active standby" (and could be RECOMMISSIONed without delay in any 
moment) until the hosts being terminated in EMR scenario.

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> 
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Zhi
>Assignee: Daniel Zhi
>  Labels: features
> Attachments: GracefulDecommissionYarnNode.pdf, 
> GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, 
> YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, 
> YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, 
> YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, 
> YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, 
> YARN-4676.018.patch, YARN-4676.019.patch
>
>
> YARN-4676 implements an automatic, asynchronous and flexible mechanism to 
> graceful decommission
> YARN nodes. After user issues the refreshNodes request, ResourceManager 
> automatically evaluates
> status of all affected nodes to kicks out decommission or recommission 
> actions. RM asynchronously
> tracks container and application status related to DECOMMISSIONING nodes to 
> decommission the
> nodes immediately after there are ready to be decommissioned. Decommissioning 
> timeout at individual
> nodes granularity is supported and could be dynamically updated. The 
> mechanism naturally supports multiple
> independent graceful decommissioning “sessions” where each one involves 
> different sets of nodes with
> different timeout settings. Such support is ideal and necessary for graceful 
> decommission request issued
> by external cluster management software instead of human.
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5466) DefaultContainerExecutor needs JavaDocs

2016-08-02 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-5466:
--

 Summary: DefaultContainerExecutor needs JavaDocs
 Key: YARN-5466
 URL: https://issues.apache.org/jira/browse/YARN-5466
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.8.0
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Minor


Following on YARN-5455, let's document the DefaultContainerExecutor as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5226) remove AHS enable check from LogsCLI#fetchAMContainerLogs

2016-08-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404454#comment-15404454
 ] 

Junping Du commented on YARN-5226:
--

The test failure is not related. I have commit v4 patch to trunk and branch-2. 
Thanks [~xgong] for patch contribution!

> remove AHS enable check from LogsCLI#fetchAMContainerLogs
> -
>
> Key: YARN-5226
> URL: https://issues.apache.org/jira/browse/YARN-5226
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-5226.1.patch, YARN-5226.2.patch, YARN-5226.3.patch, 
> YARN-5226.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

2016-08-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404441#comment-15404441
 ] 

Junping Du commented on YARN-4676:
--

Below transitions for RMNode doesn't sound correct:
{noformat}
+  .addTransition(NodeState.DECOMMISSIONED, NodeState.RUNNING,
+  RMNodeEventType.RECOMMISSION,
+  new RecommissionNodeTransition(NodeState.RUNNING))
+  .addTransition(NodeState.DECOMMISSIONED, NodeState.DECOMMISSIONED,
+  RMNodeEventType.CLEANUP_CONTAINER, new CleanUpContainerTransition())
+  .addTransition(NodeState.DECOMMISSIONED, NodeState.LOST,
+  RMNodeEventType.EXPIRE, new DeactivateNodeTransition(NodeState.LOST))
+  .addTransition(NodeState.DECOMMISSIONED, NodeState.DECOMMISSIONED,
+  RMNodeEventType.CLEANUP_APP, new CleanUpAppTransition())
{noformat}
1. RMNode in DECOMMISSIONED status shouldn't transit to RUNNING. The only way 
to make a decommissioned node to be active again is through two steps: 1) put 
this node back to RM include-node list and call refreshNode CLI, 2) restart the 
NM to register to RM again. The different with DECOMMISSIONING node is: 
decommissioning node is still running so step 2 is not needed. For 
DECOMMISSIONED node, we never know when step 2 will happen so we shouldn't mark 
node as RUNNING.

2. Transmit node from DECOMMISSIONED to LOST is not necessary. Node in 
DECOMMISSIONED is already down and won't have heartbeat to RM again so we stop 
heartbeat monitor against this node and no chance for EXPIRE event get sent.

3. For CleanUpContainerTransition(), CleanUpAppTransition() (and 
AddContainersToBeRemovedFromNMTransition() in existing code base), I don't 
think this is necessary as NM get decommissioned should already be clean up in 
state - this is different with NM shutdown where we keep container running and 
keeping NM state up-to-date.

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> 
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Zhi
>Assignee: Daniel Zhi
>  Labels: features
> Attachments: GracefulDecommissionYarnNode.pdf, 
> GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, 
> YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, 
> YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, 
> YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, 
> YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, 
> YARN-4676.018.patch, YARN-4676.019.patch
>
>
> YARN-4676 implements an automatic, asynchronous and flexible mechanism to 
> graceful decommission
> YARN nodes. After user issues the refreshNodes request, ResourceManager 
> automatically evaluates
> status of all affected nodes to kicks out decommission or recommission 
> actions. RM asynchronously
> tracks container and application status related to DECOMMISSIONING nodes to 
> decommission the
> nodes immediately after there are ready to be decommissioned. Decommissioning 
> timeout at individual
> nodes granularity is supported and could be dynamically updated. The 
> mechanism naturally supports multiple
> independent graceful decommissioning “sessions” where each one involves 
> different sets of nodes with
> different timeout settings. Such support is ideal and necessary for graceful 
> decommission request issued
> by external cluster management software instead of human.
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5465) Server-Side NM Graceful Decommissioning subsequent call behavior

2016-08-02 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404354#comment-15404354
 ] 

Robert Kanter commented on YARN-5465:
-

I think the second option is better.  Even though updating the timeout of a 
currently decommissioning node is harder, it's at least possible to have 
different sets of decommissioning nodes with different timeouts, which seems 
like a common scenario to me.  The first option doesn't allow you to do this at 
all.

> Server-Side NM Graceful Decommissioning subsequent call behavior
> 
>
> Key: YARN-5465
> URL: https://issues.apache.org/jira/browse/YARN-5465
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Robert Kanter
>
> The Server-Side NM Graceful Decommissioning feature added by YARN-4676 has 
> the following behavior when subsequent calls are made:
> # Start a long-running job that has containers running on nodeA
> # Add nodeA to the exclude file
> # Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully 
> decommissioning nodeA
> # Wait 30 seconds
> # Add nodeB to the exclude file
> # Run {{-refreshNodes -g 30 -server}} (30sec)
> # After 30 seconds, both nodeA and nodeB shut down
> In a nutshell, issuing a subsequent call to gracefully decommission nodes 
> updates the timeout for any currently decommissioning nodes.  This makes it 
> impossible to gracefully decommission different sets of nodes with different 
> timeouts.  Though it does let you easily update the timeout of currently 
> decommissioning nodes.
> Another behavior we could do is this:
> # {color:grey}Start a long-running job that has containers running on nodeA
> # {color:grey}Add nodeA to the exclude file{color}
> # {color:grey}Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully 
> decommissioning nodeA{color}
> # {color:grey}Wait 30 seconds{color}
> # {color:grey}Add nodeB to the exclude file{color}
> # {color:grey}Run {{-refreshNodes -g 30 -server}} (30sec){color}
> # After 30 seconds, nodeB shuts down
> # After 60 more seconds, nodeA shuts down
> This keeps the nodes affected by each call to gracefully decommission nodes 
> independent.  You can now have different sets of decommissioning nodes with 
> different timeouts.  However, to update the timeout of a currently 
> decommissioning node, you'd have to first recommission it, and then 
> decommission it again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5465) Server-Side NM Graceful Decommissioning subsequent call behavior

2016-08-02 Thread Robert Kanter (JIRA)
Robert Kanter created YARN-5465:
---

 Summary: Server-Side NM Graceful Decommissioning subsequent call 
behavior
 Key: YARN-5465
 URL: https://issues.apache.org/jira/browse/YARN-5465
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Robert Kanter


The Server-Side NM Graceful Decommissioning feature added by YARN-4676 has the 
following behavior when subsequent calls are made:
# Start a long-running job that has containers running on nodeA
# Add nodeA to the exclude file
# Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully 
decommissioning nodeA
# Wait 30 seconds
# Add nodeB to the exclude file
# Run {{-refreshNodes -g 30 -server}} (30sec)
# After 30 seconds, both nodeA and nodeB shut down

In a nutshell, issuing a subsequent call to gracefully decommission nodes 
updates the timeout for any currently decommissioning nodes.  This makes it 
impossible to gracefully decommission different sets of nodes with different 
timeouts.  Though it does let you easily update the timeout of currently 
decommissioning nodes.

Another behavior we could do is this:
# {color:grey}Start a long-running job that has containers running on nodeA
# {color:grey}Add nodeA to the exclude file{color}
# {color:grey}Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully 
decommissioning nodeA{color}
# {color:grey}Wait 30 seconds{color}
# {color:grey}Add nodeB to the exclude file{color}
# {color:grey}Run {{-refreshNodes -g 30 -server}} (30sec){color}
# After 30 seconds, nodeB shuts down
# After 60 more seconds, nodeA shuts down

This keeps the nodes affected by each call to gracefully decommission nodes 
independent.  You can now have different sets of decommissioning nodes with 
different timeouts.  However, to update the timeout of a currently 
decommissioning node, you'd have to first recommission it, and then 
decommission it again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4717) TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup

2016-08-02 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404351#comment-15404351
 ] 

Eric Badger commented on YARN-4717:
---

[~templedf], [~rkanter], can we cherry-pick this back to 2.7? I just saw this 
failure in our nightly build.

> TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails 
> Intermittently due to IllegalArgumentException from cleanup
> ---
>
> Key: YARN-4717
> URL: https://issues.apache.org/jira/browse/YARN-4717
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-4717.001.patch
>
>
> The same issue that was resolved by [~zxu] in YARN-3602 is back.  Looks like 
> the commons-io package throws an IAE instead of an IOE now if the directory 
> doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5226) remove AHS enable check from LogsCLI#fetchAMContainerLogs

2016-08-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404349#comment-15404349
 ] 

Hadoop QA commented on YARN-5226:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 24s {color} 
| {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 20m 20s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.client.api.impl.TestYarnClient |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12821422/YARN-5226.4.patch |
| JIRA Issue | YARN-5226 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 8bc210e63249 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / b3018e7 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/12614/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-YARN-Build/12614/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/12614/testReport/ |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/12614/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> remove AHS enable check from LogsCLI#fetchAMContainerLogs
> 

[jira] [Resolved] (YARN-5463) TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup

2016-08-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger resolved YARN-5463.
---
Resolution: Duplicate

Closing as a dup of YARN-4717. Not sure how I didn't see the old one before I 
opened this one. Oops

> TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails 
> Intermittently due to IllegalArgumentException from cleanup
> ---
>
> Key: YARN-5463
> URL: https://issues.apache.org/jira/browse/YARN-5463
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: YARN-5463.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5463) TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup

2016-08-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404324#comment-15404324
 ] 

Hadoop QA commented on YARN-5463:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} YARN-5463 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12821644/YARN-5463.001.patch |
| JIRA Issue | YARN-5463 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/12615/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails 
> Intermittently due to IllegalArgumentException from cleanup
> ---
>
> Key: YARN-5463
> URL: https://issues.apache.org/jira/browse/YARN-5463
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: YARN-5463.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2016-08-02 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-5464:

Target Version/s: 2.9.0

> Server-Side NM Graceful Decommissioning with RM HA
> --
>
> Key: YARN-5464
> URL: https://issues.apache.org/jira/browse/YARN-5464
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2016-08-02 Thread Robert Kanter (JIRA)
Robert Kanter created YARN-5464:
---

 Summary: Server-Side NM Graceful Decommissioning with RM HA
 Key: YARN-5464
 URL: https://issues.apache.org/jira/browse/YARN-5464
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Robert Kanter
Assignee: Robert Kanter






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

2016-08-02 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404312#comment-15404312
 ] 

Robert Kanter commented on YARN-4676:
-

One final minor thing:
- In {{RMAdminCLI#refreshNodes}}, the client and server tracking is mutually 
exclusive.  So we shouldn't need the 5 second grace period because it should 
only be one or the other.

+1 after that.

[~djp], can you also take a look at the latest patch?

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> 
>
> Key: YARN-4676
> URL: https://issues.apache.org/jira/browse/YARN-4676
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Zhi
>Assignee: Daniel Zhi
>  Labels: features
> Attachments: GracefulDecommissionYarnNode.pdf, 
> GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, 
> YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, 
> YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, 
> YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, 
> YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, 
> YARN-4676.018.patch, YARN-4676.019.patch
>
>
> YARN-4676 implements an automatic, asynchronous and flexible mechanism to 
> graceful decommission
> YARN nodes. After user issues the refreshNodes request, ResourceManager 
> automatically evaluates
> status of all affected nodes to kicks out decommission or recommission 
> actions. RM asynchronously
> tracks container and application status related to DECOMMISSIONING nodes to 
> decommission the
> nodes immediately after there are ready to be decommissioned. Decommissioning 
> timeout at individual
> nodes granularity is supported and could be dynamically updated. The 
> mechanism naturally supports multiple
> independent graceful decommissioning “sessions” where each one involves 
> different sets of nodes with
> different timeout settings. Such support is ideal and necessary for graceful 
> decommission request issued
> by external cluster management software instead of human.
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5463) TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup

2016-08-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5463:
--
Attachment: YARN-5463.001.patch

Attaching patch to catch and ignore IllegalArgumentExceptions along with the 
IOExceptions.

> TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails 
> Intermittently due to IllegalArgumentException from cleanup
> ---
>
> Key: YARN-5463
> URL: https://issues.apache.org/jira/browse/YARN-5463
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: YARN-5463.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5463) TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup

2016-08-02 Thread Eric Badger (JIRA)
Eric Badger created YARN-5463:
-

 Summary: 
TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails 
Intermittently due to IllegalArgumentException from cleanup
 Key: YARN-5463
 URL: https://issues.apache.org/jira/browse/YARN-5463
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Eric Badger
Assignee: Eric Badger






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5463) TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup

2016-08-02 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404293#comment-15404293
 ] 

Eric Badger commented on YARN-5463:
---

YARN-3602 fixed IOException, we need to now add IllegalArgumentException

> TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails 
> Intermittently due to IllegalArgumentException from cleanup
> ---
>
> Key: YARN-5463
> URL: https://issues.apache.org/jira/browse/YARN-5463
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5462) TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown fails intermittently

2016-08-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404238#comment-15404238
 ] 

Hadoop QA commented on YARN-5462:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
2s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 24s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 26m 44s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12821637/YARN-5462.001.patch |
| JIRA Issue | YARN-5462 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 69c7294b0aa4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 7fc70c6 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/12613/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/12613/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown fails 
> intermittently
> --
>
> Key: YARN-5462
> URL: https://issues.apache.org/jira/browse/YARN-5462
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: YARN-5462.001.patch
>
>
> {noformat}
> java.io.IOException: Failed on local exception: 

[jira] [Commented] (YARN-5333) Some recovered apps are put into default queue when RM HA

2016-08-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404243#comment-15404243
 ] 

Hadoop QA commented on YARN-5333:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
43s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 21s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 254 unchanged - 0 fixed = 255 total (was 254) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 37m 22s 
{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 51m 50s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12821629/YARN-5333.06.patch |
| JIRA Issue | YARN-5333 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 0eb4108d33ba 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 7fc70c6 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/12612/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/12612/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/12612/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Some recovered apps are put into default queue when RM HA
> -
>
> Key: YARN-5333
> URL: 

[jira] [Commented] (YARN-5287) LinuxContainerExecutor fails to set proper permission

2016-08-02 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404235#comment-15404235
 ] 

Varun Vasudev commented on YARN-5287:
-

[~Ying Zhang] the patch no longer applies cleanly on trunk.

It would be great if you could make a few minor changes -
1)
{code}
+/**
+ *  * Function to prepare the container directories.
+ *   * It creates the container work and log directories.
+ **/
{code}
Please change the comment to follow the same format as other comments(no need 
for the extra "*" on the individual lines)

2)
{code}
+// This test is used to verify that app and container directories can be
+// created with required permissions when umask has been set to a restrictive
+// value of 077.
{code}
Change the formatting to follow {code}/** .. */{code}

3)
{code}+  //Create container directories for "app_5"{code}
Add space between // and Create

> LinuxContainerExecutor fails to set proper permission
> -
>
> Key: YARN-5287
> URL: https://issues.apache.org/jira/browse/YARN-5287
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.2
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>Priority: Minor
> Attachments: YARN-5287-tmp.patch, YARN-5287.003.patch, 
> YARN-5287.004.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> LinuxContainerExecutor fails to set the proper permissions on the local 
> directories(i.e., /hadoop/yarn/local/usercache/... by default) if the cluster 
> has been configured with a restrictive umask, e.g.: umask 077. Job failed due 
> to the following reason:
> Path /hadoop/yarn/local/usercache/ambari-qa/appcache/application_ has 
> permission 700 but needs permission 750



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5287) LinuxContainerExecutor fails to set proper permission

2016-08-02 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404170#comment-15404170
 ] 

Naganarasimha G R commented on YARN-5287:
-

Thanks for the patch [~Ying Zhang], 
+1 , Latest patch LGTM, If no more comments will wait for some time and commit 
it !

> LinuxContainerExecutor fails to set proper permission
> -
>
> Key: YARN-5287
> URL: https://issues.apache.org/jira/browse/YARN-5287
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.2
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>Priority: Minor
> Attachments: YARN-5287-tmp.patch, YARN-5287.003.patch, 
> YARN-5287.004.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> LinuxContainerExecutor fails to set the proper permissions on the local 
> directories(i.e., /hadoop/yarn/local/usercache/... by default) if the cluster 
> has been configured with a restrictive umask, e.g.: umask 077. Job failed due 
> to the following reason:
> Path /hadoop/yarn/local/usercache/ambari-qa/appcache/application_ has 
> permission 700 but needs permission 750



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5462) TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown fails intermittently

2016-08-02 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5462:
--
Attachment: YARN-5462.001.patch

Attaching patch that adds an extra barrier to the serviceStop method for the 
NM. This way the RPC interfaces won't get torn down before the container gets 
started and so the connection won't be dropped. 

> TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown fails 
> intermittently
> --
>
> Key: YARN-5462
> URL: https://issues.apache.org/jira/browse/YARN-5462
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: YARN-5462.001.patch
>
>
> {noformat}
> java.io.IOException: Failed on local exception: java.io.IOException: 
> Connection reset by peer; Host Details : local host is: 
> "slave-02.adcd.infra.corp.gq1.yahoo.com/69.147.96.229"; destination host is: 
> "127.0.0.1":12345; 
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1457)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1390)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>   at com.sun.proxy.$Proxy78.startContainers(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:101)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:248)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown(TestNodeStatusUpdater.java:1492)
> Caused by: java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>   at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>   at 
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>   at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>   at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>   at java.io.FilterInputStream.read(FilterInputStream.java:83)
>   at java.io.FilterInputStream.read(FilterInputStream.java:83)
>   at 
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:508)
>   at java.io.DataInputStream.readInt(DataInputStream.java:387)
>   at 
> org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1730)
>   at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1078)
>   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:977)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5462) TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown fails intermittently

2016-08-02 Thread Eric Badger (JIRA)
Eric Badger created YARN-5462:
-

 Summary: 
TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown fails 
intermittently
 Key: YARN-5462
 URL: https://issues.apache.org/jira/browse/YARN-5462
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Eric Badger
Assignee: Eric Badger


{noformat}
java.io.IOException: Failed on local exception: java.io.IOException: Connection 
reset by peer; Host Details : local host is: 
"slave-02.adcd.infra.corp.gq1.yahoo.com/69.147.96.229"; destination host is: 
"127.0.0.1":12345; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)
at org.apache.hadoop.ipc.Client.call(Client.java:1457)
at org.apache.hadoop.ipc.Client.call(Client.java:1390)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy78.startContainers(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:101)
at 
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:248)
at 
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown(TestNodeStatusUpdater.java:1492)
Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at 
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at 
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:508)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at 
org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1730)
at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1078)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:977)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5333) Some recovered apps are put into default queue when RM HA

2016-08-02 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404150#comment-15404150
 ] 

Jun Gong commented on YARN-5333:


Attach a new patch.

According to the suggestion, I abstracted refreshXXXWithout functions that do 
refresh without checking RM status. 

About the test case, it needs be bounded to a specific scheduler(either 
Capacity or FairScheduler) to reproduce the error case, so there is no change 
for it. Is it OK?

> Some recovered apps are put into default queue when RM HA
> -
>
> Key: YARN-5333
> URL: https://issues.apache.org/jira/browse/YARN-5333
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-5333.01.patch, YARN-5333.02.patch, 
> YARN-5333.03.patch, YARN-5333.04.patch, YARN-5333.05.patch, YARN-5333.06.patch
>
>
> Enable RM HA and use FairScheduler, 
> {{yarn.scheduler.fair.allow-undeclared-pools}} is set to false, 
> {{yarn.scheduler.fair.user-as-default-queue}} is set to false.
> Reproduce steps:
> 1. Start two RMs.
> 2. After RMs are running, change both RM's file 
> {{etc/hadoop/fair-scheduler.xml}}, then add some queues.
> 3. Submit some apps to the new added queues.
> 4. Stop the active RM, then the standby RM will transit to active and recover 
> apps.
> However the new active RM will put recovered apps into default queue because 
> it might have not loaded the new {{fair-scheduler.xml}}. We need call 
> {{initScheduler}} before start active services or bring {{refreshAll()}} in 
> front of {{rm.transitionToActive()}}. *It seems it is also important for 
> other scheduler*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5333) Some recovered apps are put into default queue when RM HA

2016-08-02 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-5333:
---
Attachment: YARN-5333.06.patch

> Some recovered apps are put into default queue when RM HA
> -
>
> Key: YARN-5333
> URL: https://issues.apache.org/jira/browse/YARN-5333
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-5333.01.patch, YARN-5333.02.patch, 
> YARN-5333.03.patch, YARN-5333.04.patch, YARN-5333.05.patch, YARN-5333.06.patch
>
>
> Enable RM HA and use FairScheduler, 
> {{yarn.scheduler.fair.allow-undeclared-pools}} is set to false, 
> {{yarn.scheduler.fair.user-as-default-queue}} is set to false.
> Reproduce steps:
> 1. Start two RMs.
> 2. After RMs are running, change both RM's file 
> {{etc/hadoop/fair-scheduler.xml}}, then add some queues.
> 3. Submit some apps to the new added queues.
> 4. Stop the active RM, then the standby RM will transit to active and recover 
> apps.
> However the new active RM will put recovered apps into default queue because 
> it might have not loaded the new {{fair-scheduler.xml}}. We need call 
> {{initScheduler}} before start active services or bring {{refreshAll()}} in 
> front of {{rm.transitionToActive()}}. *It seems it is also important for 
> other scheduler*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5430) Get container's ip and host from NM

2016-08-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404142#comment-15404142
 ] 

Hadoop QA commented on YARN-5430:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 56s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
46s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 34s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 37s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: The patch generated 16 
new + 160 unchanged - 1 fixed = 176 total (was 161) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 51s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 19s {color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 8s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 44m 36s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
|  |  Format-string method String.format(String, Object[]) called with format 
string "Shell execution failed:with format string "Shell execution failed: 
ExitCode = %s Stderr: %s Stdout: %s Command:" wants 3 arguments but is given 4 
in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(List,
 PrivilegedOperation, File, Map, boolean, boolean)  At 
PrivilegedOperationExecutor.java:[line 160] |
| Failed junit tests | hadoop.yarn.logaggregation.TestAggregatedLogFormat |
|   | 

[jira] [Commented] (YARN-5382) RM does not audit log kill request for active applications

2016-08-02 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404090#comment-15404090
 ] 

Jason Lowe commented on YARN-5382:
--

bq. Does user expect audit logging both before killing and after killing 
successfully ?

Ideally from the ClientRMService perspective it should be logged when the 
request comes in, just like web servers audit log requests they serve.  
Unfortunately the polling-for-killed logic makes this messy to implement 
cleanly, so logging once when the app is killed would be the next best option, 
IMHO.

Sorry I missed the fact that the success logging in ClientRMService was still 
there.  At some point it was removed, but I missed it didn't stay that way.   I 
also missed that AttemptKilledTransition and AppKilledTransition can both be 
triggered for an app being killed.

> RM does not audit log kill request for active applications
> --
>
> Key: YARN-5382
> URL: https://issues.apache.org/jira/browse/YARN-5382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
>Assignee: Vrushali C
> Attachments: YARN-5382-branch-2.7.01.patch, 
> YARN-5382-branch-2.7.02.patch, YARN-5382-branch-2.7.03.patch, 
> YARN-5382-branch-2.7.04.patch, YARN-5382-branch-2.7.05.patch, 
> YARN-5382-branch-2.7.09.patch, YARN-5382.06.patch, YARN-5382.07.patch, 
> YARN-5382.08.patch, YARN-5382.09.patch
>
>
> ClientRMService will audit a kill request but only if it either fails to 
> issue the kill or if the kill is sent to an already finished application.  It 
> does not create a log entry when the application is active which is arguably 
> the most important case to audit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5461) Port initial slider-core module code into yarn

2016-08-02 Thread Jian He (JIRA)
Jian He created YARN-5461:
-

 Summary: Port initial slider-core module code into yarn
 Key: YARN-5461
 URL: https://issues.apache.org/jira/browse/YARN-5461
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5160) Add timeout when starting JobHistoryServer in MiniMRYarnCluster

2016-08-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404059#comment-15404059
 ] 

Hadoop QA commented on YARN-5160:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
58s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 115m 42s 
{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
27s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 128m 35s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.mapred.TestMRCJCFileOutputCommitter |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12821558/YARN-5160.01.patch |
| JIRA Issue | YARN-5160 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux e507e0c41f2d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 7fc70c6 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/12610/artifact/patchprocess/whitespace-eol.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/12610/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-YARN-Build/12610/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/12610/testReport/ |
| modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 U: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/12610/console |
| 

[jira] [Updated] (YARN-5430) Get container's ip and host from NM

2016-08-02 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-5430:
--
Attachment: YARN-5430.2.patch

> Get container's ip and host from NM
> ---
>
> Key: YARN-5430
> URL: https://issues.apache.org/jira/browse/YARN-5430
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5430.1.patch, YARN-5430.2.patch
>
>
> In YARN-4757, we introduced a DNS mechanism for containers. That's based on 
> the assumption  that we can get the container's ip and host information and 
> store it in the registry-service. This jira aims to get the container's ip 
> and host from the NM, primarily docker container



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5458) Rename DockerStopCommandTest to TestDockerStopCommand

2016-08-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403852#comment-15403852
 ] 

Hudson commented on YARN-5458:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #10192 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10192/])
YARN-5458. Rename DockerStopCommandTest to TestDockerStopCommand. (vvasudev: 
rev 7fc70c6422da3602ad9d4364493f25454a1de50c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerStopCommandTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/TestDockerStopCommand.java


> Rename DockerStopCommandTest to TestDockerStopCommand
> -
>
> Key: YARN-5458
> URL: https://issues.apache.org/jira/browse/YARN-5458
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Trivial
> Fix For: 2.9.0
>
> Attachments: YARN-5458.001.patch
>
>
> DockerStopCommandTest does not follow the naming convention for test classes, 
> rename it to TestDockerStopCommand



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5443) Add support for docker inspect command

2016-08-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403853#comment-15403853
 ] 

Hudson commented on YARN-5443:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #10192 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10192/])
YARN-5443. Add support for docker inspect command. Contributed by Shane 
(vvasudev: rev 2e7c2a13a853b8195bc4f51f6c3c1f61656c2b33)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/TestDockerInspectCommand.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerInspectCommand.java


> Add support for docker inspect command
> --
>
> Key: YARN-5443
> URL: https://issues.apache.org/jira/browse/YARN-5443
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
> Fix For: 2.9.0
>
> Attachments: YARN-5443.001.patch, YARN-5443.002.patch
>
>
> Similar to the DockerStopCommand and DockerRunCommand, it would be desirable 
> to have a DockerInspectCommand. The initial use is for retrieving a 
> containers status, but many other uses are possible (IP information, volume 
> information, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5458) Rename DockerStopCommandTest to TestDockerStopCommand

2016-08-02 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-5458:

Issue Type: Sub-task  (was: Bug)
Parent: YARN-3611

> Rename DockerStopCommandTest to TestDockerStopCommand
> -
>
> Key: YARN-5458
> URL: https://issues.apache.org/jira/browse/YARN-5458
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Trivial
> Fix For: 2.9.0
>
> Attachments: YARN-5458.001.patch
>
>
> DockerStopCommandTest does not follow the naming convention for test classes, 
> rename it to TestDockerStopCommand



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5443) Add support for docker inspect command

2016-08-02 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403826#comment-15403826
 ] 

Shane Kumpf commented on YARN-5443:
---

Thanks, [~vvasudev]!

> Add support for docker inspect command
> --
>
> Key: YARN-5443
> URL: https://issues.apache.org/jira/browse/YARN-5443
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
> Fix For: 2.9.0
>
> Attachments: YARN-5443.001.patch, YARN-5443.002.patch
>
>
> Similar to the DockerStopCommand and DockerRunCommand, it would be desirable 
> to have a DockerInspectCommand. The initial use is for retrieving a 
> containers status, but many other uses are possible (IP information, volume 
> information, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5458) Rename DockerStopCommandTest to TestDockerStopCommand

2016-08-02 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403823#comment-15403823
 ] 

Shane Kumpf commented on YARN-5458:
---

Thanks [~vvasudev]!

> Rename DockerStopCommandTest to TestDockerStopCommand
> -
>
> Key: YARN-5458
> URL: https://issues.apache.org/jira/browse/YARN-5458
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Trivial
> Fix For: 2.9.0
>
> Attachments: YARN-5458.001.patch
>
>
> DockerStopCommandTest does not follow the naming convention for test classes, 
> rename it to TestDockerStopCommand



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5458) Rename DockerStopCommandTest to TestDockerStopCommand

2016-08-02 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403793#comment-15403793
 ] 

Varun Vasudev commented on YARN-5458:
-

+1, committing this.

> Rename DockerStopCommandTest to TestDockerStopCommand
> -
>
> Key: YARN-5458
> URL: https://issues.apache.org/jira/browse/YARN-5458
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Trivial
> Attachments: YARN-5458.001.patch
>
>
> DockerStopCommandTest does not follow the naming convention for test classes, 
> rename it to TestDockerStopCommand



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5443) Add support for docker inspect command

2016-08-02 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-5443:

Summary: Add support for docker inspect command  (was: Add support for 
docker inspect)

> Add support for docker inspect command
> --
>
> Key: YARN-5443
> URL: https://issues.apache.org/jira/browse/YARN-5443
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
> Attachments: YARN-5443.001.patch, YARN-5443.002.patch
>
>
> Similar to the DockerStopCommand and DockerRunCommand, it would be desirable 
> to have a DockerInspectCommand. The initial use is for retrieving a 
> containers status, but many other uses are possible (IP information, volume 
> information, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5428) Allow for specifying the docker client configuration directory

2016-08-02 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403785#comment-15403785
 ] 

Shane Kumpf commented on YARN-5428:
---

We don't pass down the $HOME environment variable or expand out ~, so the 
default setting of ~/.docker/config.json will not be honored when running 
docker containers on YARN. 

This patch will give you choice as to where you store the config.json file. An 
administrator still needs to deploy config.json to the location specified by 
this configuration. Deploying the config.json file pre-populated with 
credentials is an alternative to the interactive "docker login" command. Also 
note that other client configuration can be stored in config.json, such as 
formatting rules, http proxy settings and a few others.

> Allow for specifying the docker client configuration directory
> --
>
> Key: YARN-5428
> URL: https://issues.apache.org/jira/browse/YARN-5428
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
> Attachments: YARN-5428.001.patch, YARN-5428.002.patch, 
> YARN-5428.003.patch, YARN-5428.004.patch
>
>
> The docker client allows for specifying a configuration directory that 
> contains the docker client's configuration. It is common to store "docker 
> login" credentials in this config, to avoid the need to docker login on each 
> cluster member. 
> By default the docker client config is $HOME/.docker/config.json on Linux. 
> However, this does not work with the current container executor user 
> switching and it may also be desirable to centralize this configuration 
> beyond the single user's home directory.
> Note that the command line arg is for the configuration directory NOT the 
> configuration file.
> This change will be needed to allow YARN to automatically pull images at 
> localization time or within container executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5459) Add support for docker rm

2016-08-02 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403756#comment-15403756
 ] 

Shane Kumpf commented on YARN-5459:
---

Thanks [~tangzhankun] - the intent is to move away from running the "docker rm" 
in container executor and allow users to control removal behavior through 
configuration. See YARN-5366.

> Add support for docker rm
> -
>
> Key: YARN-5459
> URL: https://issues.apache.org/jira/browse/YARN-5459
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Minor
> Attachments: YARN-5459.001.patch
>
>
> Add support for the docker rm command to be used for cleaning up exited and 
> failed containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5333) Some recovered apps are put into default queue when RM HA

2016-08-02 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403741#comment-15403741
 ] 

Jun Gong edited comment on YARN-5333 at 8/2/16 10:43 AM:
-

Thanks [~rohithsharma], [~jianhe] for the review and comments!

bq. 1. Should private boolean isTransitingToActive = false; is volatile?
Yes, it needs be volatile. I'll update it.

{quote}
2. Since none of the refreshXXX methods are synchronized, patch introduces a 
concurrency issue. If there is an explicit admin call for refreshing at the 
time of transitionToActive, then checkRMStatus will be executed for other admin 
calls. Until RM transition-to-active completely, explicit admin commands should 
not allowed to refresh. I think, we should incorporate similar to 
refreshAdminAcl method.
{quote}
How about adding {{synchronized}} to each refresh functions? It avoids adding 
more logic. When admin command comes, we could just call corresponding refresh 
functions. I think it does not matter to call refresh function many times.

bq. 3. I think flag checkRMHAState can be passed to method checkRMStatus.
I was thinking it. If adding checkRMHAState to checkRMStatus, we need add this 
parameter(checkRMHAState) to all refresh functions too(which is similar to 
refreshAdminAcl), there are a lot of places that call refresh functions. It 
might be better to just add a check before checkRMStatus?

bq. I think if you can simulate test for generally instead of specific to fair 
scheduler, this test can be moved to class TestRMHA. There is already test 
TestRMHA#testTransitionedToActiveRefreshFail, probable the same test can be 
changed?
Thanks. I'll update the test case.

{quote}
Instead of reusing the existing refreshAll method, I checked each refresh 
method, it should be cleaner to just create a new method which includes all 
necessary reconfig steps. This also avoids unnecessary audit logs, acl checks.
{quote}
Yes, it will be more clear to add a new method to include all reconfig steps. 
My doubt is that there will be two places that do similar reconfig things(the 
one is in refresh functions, the other is in the new added method). Then we 
need to modify both places if there is some change for one of them. I will try 
to refactor those refresh functions.


was (Author: hex108):
Thanks [~rohithsharma], [~jianhe] for the review and comments!

bq. 1. Should private boolean isTransitingToActive = false; is volatile?
Yes, it needs be volatile. I'll update it.

{quote}
2. Since none of the refreshXXX methods are synchronized, patch introduces a 
concurrency issue. If there is an explicit admin call for refreshing at the 
time of transitionToActive, then checkRMStatus will be executed for other admin 
calls. Until RM transition-to-active completely, explicit admin commands should 
not allowed to refresh. I think, we should incorporate similar to 
refreshAdminAcl method.
{quote}
How about adding {{synchronized}} to each refresh functions? It avoids adding 
more logic. When admin command comes, we could just call corresponding refresh 
functions. I think it does not matter to call refresh function many times.

bq. 3. I think flag checkRMHAState can be passed to method checkRMStatus.
I was thinking it. If adding checkRMHAState to checkRMStatus, we need add this 
parameter(checkRMHAState) to all refresh functions too(which is similar to 
refreshAdminAcl), there are a lot of places that call refresh functions. It 
might be better to just add a check before checkRMStatus?

bq. I think if you can simulate test for generally instead of specific to fair 
scheduler, this test can be moved to class TestRMHA. There is already test 
TestRMHA#testTransitionedToActiveRefreshFail, probable the same test can be 
changed?
Thanks. I'll update the test case.

{quote}
Instead of reusing the existing refreshAll method, I checked each refresh 
method, it should be cleaner to just create a new method which includes all 
necessary reconfig steps. This also avoids unnecessary audit logs, acl checks.
{quote}
Yes, it will be more clear to add a new method to include all reconfig steps. 
My doubt is that there will be two places that do similar reconfig things(the 
one is in refresh functions, the other is in the new added method). Then we 
need to modify both places if there is some change for one of them.

> Some recovered apps are put into default queue when RM HA
> -
>
> Key: YARN-5333
> URL: https://issues.apache.org/jira/browse/YARN-5333
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-5333.01.patch, YARN-5333.02.patch, 
> YARN-5333.03.patch, YARN-5333.04.patch, YARN-5333.05.patch
>
>
> Enable RM HA and use FairScheduler, 
> {{yarn.scheduler.fair.allow-undeclared-pools}} is set to 

[jira] [Commented] (YARN-5333) Some recovered apps are put into default queue when RM HA

2016-08-02 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403741#comment-15403741
 ] 

Jun Gong commented on YARN-5333:


Thanks [~rohithsharma], [~jianhe] for the review and comments!

bq. 1. Should private boolean isTransitingToActive = false; is volatile?
Yes, it needs be volatile. I'll update it.

{quote}
2. Since none of the refreshXXX methods are synchronized, patch introduces a 
concurrency issue. If there is an explicit admin call for refreshing at the 
time of transitionToActive, then checkRMStatus will be executed for other admin 
calls. Until RM transition-to-active completely, explicit admin commands should 
not allowed to refresh. I think, we should incorporate similar to 
refreshAdminAcl method.
{quote}
How about adding {{synchronized}} to each refresh functions? It avoids adding 
more logic. When admin command comes, we could just call corresponding refresh 
functions. I think it does not matter to call refresh function many times.

bq. 3. I think flag checkRMHAState can be passed to method checkRMStatus.
I was thinking it. If adding checkRMHAState to checkRMStatus, we need add this 
parameter(checkRMHAState) to all refresh functions too(which is similar to 
refreshAdminAcl), there are a lot of places that call refresh functions. It 
might be better to just add a check before checkRMStatus?

bq. I think if you can simulate test for generally instead of specific to fair 
scheduler, this test can be moved to class TestRMHA. There is already test 
TestRMHA#testTransitionedToActiveRefreshFail, probable the same test can be 
changed?
Thanks. I'll update the test case.

{quote}
Instead of reusing the existing refreshAll method, I checked each refresh 
method, it should be cleaner to just create a new method which includes all 
necessary reconfig steps. This also avoids unnecessary audit logs, acl checks.
{quote}
Yes, it will be more clear to add a new method to include all reconfig steps. 
My doubt is that there will be two places that do similar reconfig things(the 
one is in refresh functions, the other is in the new added method). Then we 
need to modify both places if there is some change for one of them.

> Some recovered apps are put into default queue when RM HA
> -
>
> Key: YARN-5333
> URL: https://issues.apache.org/jira/browse/YARN-5333
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-5333.01.patch, YARN-5333.02.patch, 
> YARN-5333.03.patch, YARN-5333.04.patch, YARN-5333.05.patch
>
>
> Enable RM HA and use FairScheduler, 
> {{yarn.scheduler.fair.allow-undeclared-pools}} is set to false, 
> {{yarn.scheduler.fair.user-as-default-queue}} is set to false.
> Reproduce steps:
> 1. Start two RMs.
> 2. After RMs are running, change both RM's file 
> {{etc/hadoop/fair-scheduler.xml}}, then add some queues.
> 3. Submit some apps to the new added queues.
> 4. Stop the active RM, then the standby RM will transit to active and recover 
> apps.
> However the new active RM will put recovered apps into default queue because 
> it might have not loaded the new {{fair-scheduler.xml}}. We need call 
> {{initScheduler}} before start active services or bring {{refreshAll()}} in 
> front of {{rm.transitionToActive()}}. *It seems it is also important for 
> other scheduler*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4091) Add REST API to retrieve scheduler activity

2016-08-02 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403712#comment-15403712
 ] 

Sunil G commented on YARN-4091:
---

bq.1) Add more detailed diagnostic messages to apps/queues,  
bq.2) Merge pending application state into node allocation state.
Yes, this is make sense. we can spin off these improvements.

bq.What do you mean by target state? Could you please explain more?
bq.I think the priority attribute in response could indicate "priority level 
0". Do you think it is enough? So we could use "priority skipped"?

Yes. I will try to explain. When an AM container is allocated, the state of app 
in the rest o/p is shown as ACCEPTED. Since we already allocated AM container 
in this heartbeat, definitely state of app ll become RUNNING/FAILED. So I was 
thinking whether it ll be informative to show the target state with the 
allocation/rejection and how far it will help the user. This can be 
enhancement, by checking use case value, we can choose to do or not do.

> Add REST API to retrieve scheduler activity
> ---
>
> Key: YARN-4091
> URL: https://issues.apache.org/jira/browse/YARN-4091
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Chen Ge
> Attachments: Improvement on debugdiagnostic information - YARN.pdf, 
> SchedulerActivityManager-TestReport v2.pdf, 
> SchedulerActivityManager-TestReport.pdf, YARN-4091-design-doc-v1.pdf, 
> YARN-4091.1.patch, YARN-4091.2.patch, YARN-4091.3.patch, YARN-4091.4.patch, 
> YARN-4091.5.patch, YARN-4091.5.patch, YARN-4091.6.patch, 
> YARN-4091.preliminary.1.patch, app_activities v2.json, app_activities.json, 
> node_activities v2.json, node_activities.json
>
>
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager

2016-08-02 Thread mai shurong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401469#comment-15401469
 ] 

mai shurong edited comment on YARN-5449 at 8/2/16 9:40 AM:
---

Sorry, I had added my description of this issue when I created it, bu was not 
submitted to jira by some problems. I would add description as soon as possible.


was (Author: shurong.mai):
Sorry, I had added my description of this issue when I created this jira, bu 
was not submitted to jira by some problems. I would add description as soon as 
possible.

> nodemanager process is hung, and lost from resourcemanager
> --
>
> Key: YARN-5449
> URL: https://issues.apache.org/jira/browse/YARN-5449
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux
> The java version is jdk1.7.0_45
> The hadoop version is hadoop-2.2.0
>Reporter: mai shurong
>
> The nodemanager process is hung(is not dead), and lost from resourcemanager.
> The nodemanager's log is stopped from printing.
> The used cpu of nodemanager process is very low(nearly 0%).
> GC of nodemanager jvm process is stopped, and the result of jstat(jstat 
> -gccause pid 1000 100) is as follows:
>   S0 S1 E  O  P YGC YGCTFGCFGCT GCT
> LGCC GCC 
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
> The nodemanager jvm process is also accur this problem using CMS garbage 
> collector or g1 garbage collector.
> The parameters of CMS garbage collector are as following:
> -Xmx4096m  -Xmn1024m  -XX:PermSize=128m -XX:MaxPermSize=128m 
> -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 
> -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 
> -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 
> The parameters of g1 garbage collector are as following:
> -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC  
> -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 
> -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4  
> -XX:+PrintAdaptiveSizePolicy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager

2016-08-02 Thread mai shurong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401469#comment-15401469
 ] 

mai shurong edited comment on YARN-5449 at 8/2/16 9:38 AM:
---

Sorry, I had added my description of this issue when I created this jira, bu 
was not submitted to jira by some problems. I would add description as soon as 
possible.


was (Author: shurong.mai):
Sorry, I had added my description of this issue when I created, bu was not 
submitted to jira by some problems. I would add description as soon as possible.

> nodemanager process is hung, and lost from resourcemanager
> --
>
> Key: YARN-5449
> URL: https://issues.apache.org/jira/browse/YARN-5449
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux
> The java version is jdk1.7.0_45
> The hadoop version is hadoop-2.2.0
>Reporter: mai shurong
>
> The nodemanager process is hung(is not dead), and lost from resourcemanager.
> The nodemanager's log is stopped from printing.
> The used cpu of nodemanager process is very low(nearly 0%).
> GC of nodemanager jvm process is stopped, and the result of jstat(jstat 
> -gccause pid 1000 100) is as follows:
>   S0 S1 E  O  P YGC YGCTFGCFGCT GCT
> LGCC GCC 
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
> The nodemanager jvm process is also accur this problem using CMS garbage 
> collector or g1 garbage collector.
> The parameters of CMS garbage collector are as following:
> -Xmx4096m  -Xmn1024m  -XX:PermSize=128m -XX:MaxPermSize=128m 
> -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 
> -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 
> -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 
> The parameters of g1 garbage collector are as following:
> -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC  
> -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 
> -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4  
> -XX:+PrintAdaptiveSizePolicy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5449) nodemanager process is hung, and lost from resourcemanager

2016-08-02 Thread mai shurong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401469#comment-15401469
 ] 

mai shurong edited comment on YARN-5449 at 8/2/16 9:38 AM:
---

Sorry, I had added my description of this issue when I created, bu was not 
submitted to jira by some problems. I would add description as soon as possible.


was (Author: shurong.mai):
Sorry, I had added my description, bu was not submitted to jira by some 
problems. I would add description as soon as possible.

> nodemanager process is hung, and lost from resourcemanager
> --
>
> Key: YARN-5449
> URL: https://issues.apache.org/jira/browse/YARN-5449
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux
> The java version is jdk1.7.0_45
> The hadoop version is hadoop-2.2.0
>Reporter: mai shurong
>
> The nodemanager process is hung(is not dead), and lost from resourcemanager.
> The nodemanager's log is stopped from printing.
> The used cpu of nodemanager process is very low(nearly 0%).
> GC of nodemanager jvm process is stopped, and the result of jstat(jstat 
> -gccause pid 1000 100) is as follows:
>   S0 S1 E  O  P YGC YGCTFGCFGCT GCT
> LGCC GCC 
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
> The nodemanager jvm process is also accur this problem using CMS garbage 
> collector or g1 garbage collector.
> The parameters of CMS garbage collector are as following:
> -Xmx4096m  -Xmn1024m  -XX:PermSize=128m -XX:MaxPermSize=128m 
> -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 
> -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 
> -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 
> The parameters of g1 garbage collector are as following:
> -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC  
> -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 
> -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4  
> -XX:+PrintAdaptiveSizePolicy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5449) nodemanager process is hung, and lost from resourcemanager

2016-08-02 Thread mai shurong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mai shurong updated YARN-5449:
--
Description: 
The nodemanager process is hung(is not dead), and lost from resourcemanager.
The nodemanager's log is stopped from printing.
The used cpu of nodemanager process is very low(nearly 0%).
GC of nodemanager jvm process is stopped, and the result of jstat(jstat 
-gccause pid 1000 100) is as follows:
  S0 S1 E  O  P YGC YGCTFGCFGCT GCTLGCC 
GCC 
  0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
GCG1 Evacuation Pause
  0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
GCG1 Evacuation Pause
  0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
GCG1 Evacuation Pause
  0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
GCG1 Evacuation Pause
  0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
GCG1 Evacuation Pause
  0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
GCG1 Evacuation Pause

The nodemanager jvm process is also accur this problem using CMS garbage 
collector or g1 garbage collector.

The parameters of CMS garbage collector are as following:
-Xmx4096m  -Xmn1024m  -XX:PermSize=128m -XX:MaxPermSize=128m 
-XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 
-XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 
-XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 

The parameters of g1 garbage collector are as following:
-Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC  
-XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 
-XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4  
-XX:+PrintAdaptiveSizePolicy


  was:
The nodemanager process is hung, and lost from resourcemanager.
The nodemanager's log is stopped from printing.
The used cpu of nodemanager process is very low(nearly 0%).
GC of nodemanager jvm process is stopped, and the result of jstat(jstat 
-gccause pid 1000 100) is as follows:
  S0 S1 E  O  P YGC YGCTFGCFGCT GCTLGCC 
GCC 
  0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
GCG1 Evacuation Pause
  0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
GCG1 Evacuation Pause
  0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
GCG1 Evacuation Pause
  0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
GCG1 Evacuation Pause
  0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
GCG1 Evacuation Pause
  0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
GCG1 Evacuation Pause

The nodemanager jvm process is also accur this problem using CMS garbage 
collector or g1 garbage collector.

The parameters of CMS garbage collector are as following:
-Xmx4096m  -Xmn1024m  -XX:PermSize=128m -XX:MaxPermSize=128m 
-XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 
-XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 
-XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 

The parameters of g1 garbage collector are as following:
-Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC  
-XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 
-XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4  
-XX:+PrintAdaptiveSizePolicy



> nodemanager process is hung, and lost from resourcemanager
> --
>
> Key: YARN-5449
> URL: https://issues.apache.org/jira/browse/YARN-5449
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux
> The java version is jdk1.7.0_45
> The hadoop version is hadoop-2.2.0
>Reporter: mai shurong
>
> The nodemanager process is hung(is not dead), and lost from resourcemanager.
> The nodemanager's log is stopped from printing.
> The used cpu of nodemanager process is very low(nearly 0%).
> GC of nodemanager jvm process is stopped, and the result of jstat(jstat 
> -gccause pid 1000 100) is as follows:
>   S0 S1 E  O  P YGC YGCTFGCFGCT GCT
> LGCC GCC 
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 75.899  629.335 No 
> GCG1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437 7

[jira] [Updated] (YARN-5160) Add timeout when starting JobHistoryServer in MiniMRYarnCluster

2016-08-02 Thread Andras Bokor (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Bokor updated YARN-5160:
---
Attachment: YARN-5160.01.patch

Uploading first patch.
I could not write JUnit test. I should have mocked the {{JobHistoryServer}} 
object but it is created inside {{serviceStart}} so I could not mock it.
I did some manual test and it worked.

> Add timeout when starting JobHistoryServer in MiniMRYarnCluster
> ---
>
> Key: YARN-5160
> URL: https://issues.apache.org/jira/browse/YARN-5160
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Andras Bokor
>Assignee: Andras Bokor
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-5160.01.patch
>
>
> This JIRA is to follow up a TODO in MiniMRYarnCluster.
> {{//TODO Add a timeout. State.STOPPED check ?}}
> I think State.STOPPED check is not needed. I do not see the value to check 
> STOPPED state here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3854) Add localization support for docker images

2016-08-02 Thread Zhankun Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403564#comment-15403564
 ] 

Zhankun Tang commented on YARN-3854:


Yes. It seems that our direction is towards "docker pull" during localization. 
Will we just discard the "HDFS+ docker load" way and design it based on "docker 
pull" ?

> Add localization support for docker images
> --
>
> Key: YARN-3854
> URL: https://issues.apache.org/jira/browse/YARN-3854
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Zhankun Tang
> Attachments: YARN-3854-branch-2.8.001.patch, 
> YARN-3854_Localization_support_for_Docker_image_v1.pdf, 
> YARN-3854_Localization_support_for_Docker_image_v2.pdf
>
>
> We need the ability to localize docker images when those images aren't 
> already available locally. There are various approaches that could be used 
> here with different trade-offs/issues : image archives on HDFS + docker load 
> ,  docker pull during the localization phase or (automatic) docker pull 
> during the run/launch phase. 
> We also need the ability to clean-up old/stale, unused images. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5428) Allow for specifying the docker client configuration directory

2016-08-02 Thread Zhankun Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403503#comment-15403503
 ] 

Zhankun Tang edited comment on YARN-5428 at 8/2/16 7:29 AM:


Thanks for the patch, [~shaneku...@gmail.com]. One question:

I remember "docker login" will store the credentials in ~/.docker/config.json 
by default. Will this patch eliminate the need of "docker login"? Or should the 
administrator store credentials in the config.json file manually? 


was (Author: tangzhankun):
thanks for the patch, [~shaneku...@gmail.com].  Looks good to me.

> Allow for specifying the docker client configuration directory
> --
>
> Key: YARN-5428
> URL: https://issues.apache.org/jira/browse/YARN-5428
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
> Attachments: YARN-5428.001.patch, YARN-5428.002.patch, 
> YARN-5428.003.patch, YARN-5428.004.patch
>
>
> The docker client allows for specifying a configuration directory that 
> contains the docker client's configuration. It is common to store "docker 
> login" credentials in this config, to avoid the need to docker login on each 
> cluster member. 
> By default the docker client config is $HOME/.docker/config.json on Linux. 
> However, this does not work with the current container executor user 
> switching and it may also be desirable to centralize this configuration 
> beyond the single user's home directory.
> Note that the command line arg is for the configuration directory NOT the 
> configuration file.
> This change will be needed to allow YARN to automatically pull images at 
> localization time or within container executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5428) Allow for specifying the docker client configuration directory

2016-08-02 Thread Zhankun Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403503#comment-15403503
 ] 

Zhankun Tang commented on YARN-5428:


thanks for the patch, [~shaneku...@gmail.com].  Looks good to me.

> Allow for specifying the docker client configuration directory
> --
>
> Key: YARN-5428
> URL: https://issues.apache.org/jira/browse/YARN-5428
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
> Attachments: YARN-5428.001.patch, YARN-5428.002.patch, 
> YARN-5428.003.patch, YARN-5428.004.patch
>
>
> The docker client allows for specifying a configuration directory that 
> contains the docker client's configuration. It is common to store "docker 
> login" credentials in this config, to avoid the need to docker login on each 
> cluster member. 
> By default the docker client config is $HOME/.docker/config.json on Linux. 
> However, this does not work with the current container executor user 
> switching and it may also be desirable to centralize this configuration 
> beyond the single user's home directory.
> Note that the command line arg is for the configuration directory NOT the 
> configuration file.
> This change will be needed to allow YARN to automatically pull images at 
> localization time or within container executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5310) AM restart failed because of the expired HDFS delegation tokens

2016-08-02 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403481#comment-15403481
 ] 

Xianyin Xin commented on YARN-5310:
---

Thanks [~aw]. Then do we have any good idea on this problem? 

> AM restart failed because of the expired HDFS delegation tokens
> ---
>
> Key: YARN-5310
> URL: https://issues.apache.org/jira/browse/YARN-5310
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
>
> For a long running AM, it would get failed when restart because the token in 
> ApplicationSubmissionContext expires. We should update it when we get a new 
> delegation token on behalf of the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >