[jira] [Commented] (YARN-3884) RMContainerImpl transition from RESERVED to KILL apphistory status not updated

2017-01-06 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806893#comment-15806893
 ] 

Varun Saxena commented on YARN-3884:


[~bibinchundatt], the patch no longer applies cleanly.

> RMContainerImpl transition from RESERVED to KILL apphistory status not updated
> --
>
> Key: YARN-3884
> URL: https://issues.apache.org/jira/browse/YARN-3884
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
> Environment: Suse11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>  Labels: oct16-easy
> Attachments: 0001-YARN-3884.patch, Apphistory Container Status.jpg, 
> Elapsed Time.jpg, Test Result-Container status.jpg, YARN-3884.0002.patch, 
> YARN-3884.0003.patch, YARN-3884.0004.patch, YARN-3884.0005.patch
>
>
> Setup
> ===
> 1 NM 3072 16 cores each
> Steps to reproduce
> ===
> 1.Submit apps  to Queue 1 with 512 mb 1 core
> 2.Submit apps  to Queue 2 with 512 mb and 5 core
> lots of containers get reserved and unreserved in this case 
> {code}
> 2015-07-02 20:45:31,169 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e24_1435849994778_0002_01_13 Container Transitioned from NEW to 
> RESERVED
> 2015-07-02 20:45:31,170 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> Reserved container  application=application_1435849994778_0002 
> resource= queue=QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=, 
> usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, 
> numContainers=5 usedCapacity=1.6410257 absoluteUsedCapacity=0.65625 
> used= cluster=
> 2015-07-02 20:45:31,170 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Re-sorting assigned queue: root.QueueA stats: QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=, 
> usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, 
> numContainers=6
> 2015-07-02 20:45:31,170 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> assignedContainer queue=root usedCapacity=0.96875 
> absoluteUsedCapacity=0.96875 used= 
> cluster=
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e24_1435849994778_0001_01_14 Container Transitioned from NEW to 
> ALLOCATED
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=AM Allocated ContainerTARGET=SchedulerApp 
> RESULT=SUCCESS  APPID=application_1435849994778_0001
> CONTAINERID=container_e24_1435849994778_0001_01_14
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: 
> Assigned container container_e24_1435849994778_0001_01_14 of capacity 
>  on host host-10-19-92-117:64318, which has 6 
> containers,  used and  available 
> after allocation
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> assignedContainer application attempt=appattempt_1435849994778_0001_01 
> container=Container: [ContainerId: 
> container_e24_1435849994778_0001_01_14, NodeId: host-10-19-92-117:64318, 
> NodeHttpAddress: host-10-19-92-117:65321, Resource: , 
> Priority: 20, Token: null, ] queue=default: capacity=0.2, 
> absoluteCapacity=0.2, usedResources=, 
> usedCapacity=2.0846906, absoluteUsedCapacity=0.4166, numApps=1, 
> numContainers=5 clusterResource=
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Re-sorting assigned queue: root.default stats: default: capacity=0.2, 
> absoluteCapacity=0.2, usedResources=, 
> usedCapacity=2.5016286, absoluteUsedCapacity=0.5, numApps=1, numContainers=6
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> assignedContainer queue=root usedCapacity=1.0 absoluteUsedCapacity=1.0 
> used= cluster=
> 2015-07-02 20:45:32,143 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e24_1435849994778_0001_01_14 Container Transitioned from 
> ALLOCATED to ACQUIRED
> 2015-07-02 20:45:32,174 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:

[jira] [Commented] (YARN-3884) RMContainerImpl transition from RESERVED to KILL apphistory status not updated

2017-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806891#comment-15806891
 ] 

Hadoop QA commented on YARN-3884:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  4s{color} 
| {color:red} YARN-3884 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-3884 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12840437/YARN-3884.0005.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/14598/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> RMContainerImpl transition from RESERVED to KILL apphistory status not updated
> --
>
> Key: YARN-3884
> URL: https://issues.apache.org/jira/browse/YARN-3884
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
> Environment: Suse11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>  Labels: oct16-easy
> Attachments: 0001-YARN-3884.patch, Apphistory Container Status.jpg, 
> Elapsed Time.jpg, Test Result-Container status.jpg, YARN-3884.0002.patch, 
> YARN-3884.0003.patch, YARN-3884.0004.patch, YARN-3884.0005.patch
>
>
> Setup
> ===
> 1 NM 3072 16 cores each
> Steps to reproduce
> ===
> 1.Submit apps  to Queue 1 with 512 mb 1 core
> 2.Submit apps  to Queue 2 with 512 mb and 5 core
> lots of containers get reserved and unreserved in this case 
> {code}
> 2015-07-02 20:45:31,169 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e24_1435849994778_0002_01_13 Container Transitioned from NEW to 
> RESERVED
> 2015-07-02 20:45:31,170 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> Reserved container  application=application_1435849994778_0002 
> resource= queue=QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=, 
> usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, 
> numContainers=5 usedCapacity=1.6410257 absoluteUsedCapacity=0.65625 
> used= cluster=
> 2015-07-02 20:45:31,170 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Re-sorting assigned queue: root.QueueA stats: QueueA: capacity=0.4, 
> absoluteCapacity=0.4, usedResources=, 
> usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, 
> numContainers=6
> 2015-07-02 20:45:31,170 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> assignedContainer queue=root usedCapacity=0.96875 
> absoluteUsedCapacity=0.96875 used= 
> cluster=
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e24_1435849994778_0001_01_14 Container Transitioned from NEW to 
> ALLOCATED
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
> OPERATION=AM Allocated ContainerTARGET=SchedulerApp 
> RESULT=SUCCESS  APPID=application_1435849994778_0001
> CONTAINERID=container_e24_1435849994778_0001_01_14
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: 
> Assigned container container_e24_1435849994778_0001_01_14 of capacity 
>  on host host-10-19-92-117:64318, which has 6 
> containers,  used and  available 
> after allocation
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> assignedContainer application attempt=appattempt_1435849994778_0001_01 
> container=Container: [ContainerId: 
> container_e24_1435849994778_0001_01_14, NodeId: host-10-19-92-117:64318, 
> NodeHttpAddress: host-10-19-92-117:65321, Resource: , 
> Priority: 20, Token: null, ] queue=default: capacity=0.2, 
> absoluteCapacity=0.2, usedResources=, 
> usedCapacity=2.0846906, absoluteUsedCapacity=0.4166, numApps=1, 
> numContainers=5 clusterResource=
> 2015-07-02 20:45:31,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Re-sorting assigned queue: 

[jira] [Commented] (YARN-6068) Log aggregation get failed when NM restart even with recovery

2017-01-06 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806868#comment-15806868
 ] 

Varun Saxena commented on YARN-6068:


Thanks [~djp] for raising the issue. We infact saw exact same issue in our 
clusters yesterday night.
The changes as such look fine to me.
In the patch, we have added an additional log ("Log aggregation abort for 
application  due to NM restart"). I think this is not required. We already 
have a log printed when we call AppLogAggregatorImpl#abortLogAggregation. That 
should be good enough I guess.

> Log aggregation get failed when NM restart even with recovery
> -
>
> Key: YARN-6068
> URL: https://issues.apache.org/jira/browse/YARN-6068
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-6068.patch
>
>
> The exception log is as following:
> {noformat}
> 2017-01-05 19:16:36,352 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:abortLogAggregation(527)) - Aborting log 
> aggregation for application_1483640789847_0001
> 2017-01-05 19:16:36,352 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:run(399)) - Aggregation did not complete for 
> application application_1483640789847_0001
> 2017-01-05 19:16:36,353 WARN  application.ApplicationImpl 
> (ApplicationImpl.java:handle(461)) - Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FAILED at RUNNING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:459)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:64)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1084)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1076)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-05 19:16:36,355 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(464)) - Application 
> application_1483640789847_0001 transitioned from RUNNING to null
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec

2017-01-06 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806781#comment-15806781
 ] 

Allen Wittenauer edited comment on YARN-6060 at 1/7/17 4:57 AM:


bq. Do we have a bug in the existing code in this case?

Yes.

bq.  Is this what you meant by mentioning pkgsrc?

Yes, although I misspoke and should have said ports.

For reference:

https://svnweb.freebsd.org/ports/head/devel/hadoop2/Makefile?view=markup#l74





was (Author: aw):
bq. Do we have a bug in the existing code in this case?

Yes.

bq.  Is this what you meant by mentioning pkgsrc?

Yes, although I misspoke and should have said ports.




> Linux container executor fails to run container on directories mounted as 
> noexec
> 
>
> Key: YARN-6060
> URL: https://issues.apache.org/jira/browse/YARN-6060
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-6060.000.patch, YARN-6060.001.patch
>
>
> If node manager directories are mounted as noexec, LCE fails with the 
> following error:
> Launching container...
> Couldn't execute the container launch file 
> /tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh
>  - Permission denied



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec

2017-01-06 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806781#comment-15806781
 ] 

Allen Wittenauer commented on YARN-6060:


bq. Do we have a bug in the existing code in this case?

Yes.

bq.  Is this what you meant by mentioning pkgsrc?

Yes, although I misspoke and should have said ports.




> Linux container executor fails to run container on directories mounted as 
> noexec
> 
>
> Key: YARN-6060
> URL: https://issues.apache.org/jira/browse/YARN-6060
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-6060.000.patch, YARN-6060.001.patch
>
>
> If node manager directories are mounted as noexec, LCE fails with the 
> following error:
> Launching container...
> Couldn't execute the container launch file 
> /tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh
>  - Permission denied



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6068) Log aggregation get failed when NM restart even with recovery

2017-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806769#comment-15806769
 ] 

Hadoop QA commented on YARN-6068:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 14s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 3 new + 15 unchanged - 0 fixed = 18 total (was 15) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 
49s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 32m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6068 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12846158/YARN-6068.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 35b9e0478b31 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / a59df15 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/14597/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/14597/artifact/patchprocess/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/14597/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 

[jira] [Updated] (YARN-6068) Log aggregation get failed when NM restart even with recovery

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6068:
-
Attachment: YARN-6068.patch

Upload a quick patch to fix it. Should be straightforward enough without UT.

> Log aggregation get failed when NM restart even with recovery
> -
>
> Key: YARN-6068
> URL: https://issues.apache.org/jira/browse/YARN-6068
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-6068.patch
>
>
> The exception log is as following:
> {noformat}
> 2017-01-05 19:16:36,352 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:abortLogAggregation(527)) - Aborting log 
> aggregation for application_1483640789847_0001
> 2017-01-05 19:16:36,352 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:run(399)) - Aggregation did not complete for 
> application application_1483640789847_0001
> 2017-01-05 19:16:36,353 WARN  application.ApplicationImpl 
> (ApplicationImpl.java:handle(461)) - Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FAILED at RUNNING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:459)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:64)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1084)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1076)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-05 19:16:36,355 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(464)) - Application 
> application_1483640789847_0001 transitioned from RUNNING to null
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6068) Log aggregation get failed when NM restart even with recovery

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6068:
-
Target Version/s: 2.8.1  (was: 2.9.0)

> Log aggregation get failed when NM restart even with recovery
> -
>
> Key: YARN-6068
> URL: https://issues.apache.org/jira/browse/YARN-6068
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>
> The exception log is as following:
> {noformat}
> 2017-01-05 19:16:36,352 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:abortLogAggregation(527)) - Aborting log 
> aggregation for application_1483640789847_0001
> 2017-01-05 19:16:36,352 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:run(399)) - Aggregation did not complete for 
> application application_1483640789847_0001
> 2017-01-05 19:16:36,353 WARN  application.ApplicationImpl 
> (ApplicationImpl.java:handle(461)) - Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FAILED at RUNNING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:459)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:64)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1084)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1076)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-05 19:16:36,355 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(464)) - Application 
> application_1483640789847_0001 transitioned from RUNNING to null
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec

2017-01-06 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806672#comment-15806672
 ] 

Miklos Szegedi commented on YARN-6060:
--

Thank you, [~aw]. Please, remember, I still have the {{#ifdef __linux}} set. 
However, I have to refer to this code again:
{code}
public UnixShellScriptBuilder(){
  line("#!/bin/bash");
  line();
}
{code}
Do we have a bug in the existing code in this case? Is this what you meant by 
mentioning pkgsrc?
I am assuming this could help there:
{code}
public UnixShellScriptBuilder(){
  line("#!/usr/bin/env bash");
  line();
}
{code}


> Linux container executor fails to run container on directories mounted as 
> noexec
> 
>
> Key: YARN-6060
> URL: https://issues.apache.org/jira/browse/YARN-6060
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-6060.000.patch, YARN-6060.001.patch
>
>
> If node manager directories are mounted as noexec, LCE fails with the 
> following error:
> Launching container...
> Couldn't execute the container launch file 
> /tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh
>  - Permission denied



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6009) RM fails to start during an upgrade - Failed to load/recover state (YarnException: Invalid application timeout, value=0 for type=LIFETIME)

2017-01-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806654#comment-15806654
 ] 

Hudson commented on YARN-6009:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11083 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11083/])
YARN-6009. Skip validating app timeout value on recovery. Contributed by 
(jianhe: rev 020316458dfe6059b700f8d93a9791e4cb817b3f)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java


> RM fails to start during an upgrade - Failed to load/recover state 
> (YarnException: Invalid application timeout, value=0 for type=LIFETIME)
> --
>
> Key: YARN-6009
> URL: https://issues.apache.org/jira/browse/YARN-6009
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Gour Saha
>Assignee: Rohith Sharma K S
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-6009.01.patch
>
>
> ResourceManager fails to start during an upgrade with the following 
> exceptions - 
> Exception 1:
> {color:red}
> {code}
> 2016-12-09 14:57:23,508 INFO  capacity.CapacityScheduler 
> (CapacityScheduler.java:initScheduler(328)) - Initialized CapacityScheduler 
> with calculator=class 
> org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator, 
> minimumAllocation=<>, 
> maximumAllocation=<>, asynchronousScheduling=false, 
> asyncScheduleInterval=5ms
> 2016-12-09 14:57:23,509 WARN  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:becomeActive(863)) - Exception handling the 
> winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:129)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:859)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:463)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when 
> transitioning to Active mode
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:318)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127)
> ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Invalid application timeout, 
> value=0 for type=LIFETIME
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:991)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1032)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1028)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:313)
> ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Invalid 
> application timeout, value=0 for type=LIFETIME
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateApplicationTimeouts(RMServerUtils.java:305)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:365)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:330)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:463)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1184)
> at 
> 

[jira] [Commented] (YARN-6015) AsyncDispatcher thread name can be set to improved debugging

2017-01-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806653#comment-15806653
 ] 

Hudson commented on YARN-6015:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11083 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11083/])
YARN-6015. AsyncDispatcher thread name can be set to improved debugging. 
(naganarasimha_gr: rev a59df15757fac7f917cb96fc8fcfeb7017475e4f)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/RMApplicationHistoryWriter.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/AbstractSystemMetricsPublisher.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/timelineservice/NMTimelinePublisher.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java


> AsyncDispatcher thread name can be set to improved debugging
> 
>
> Key: YARN-6015
> URL: https://issues.apache.org/jira/browse/YARN-6015
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ajith S
>Assignee: Ajith S
> Attachments: YARN-6015.01.patch, YARN-6015.02.patch
>
>
> Currently all the running instances of AsyncDispatcher have same thread name. 
> To improve debugging, we can have option to set thread name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6015) AsyncDispatcher thread name can be set to improved debugging

2017-01-06 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806640#comment-15806640
 ] 

Naganarasimha G R commented on YARN-6015:
-

Hi [~ajithshetty], have committed it to trunk, but it fails to get applied for 
branch-2, can you please rebase the patch (delete for 2 new trunk files and for 
SystemMetricsPublisher.java )

> AsyncDispatcher thread name can be set to improved debugging
> 
>
> Key: YARN-6015
> URL: https://issues.apache.org/jira/browse/YARN-6015
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ajith S
>Assignee: Ajith S
> Attachments: YARN-6015.01.patch, YARN-6015.02.patch
>
>
> Currently all the running instances of AsyncDispatcher have same thread name. 
> To improve debugging, we can have option to set thread name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6068) Log aggregation get failed when NM restart even with recovery

2017-01-06 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806626#comment-15806626
 ] 

Junping Du commented on YARN-6068:
--

In YARN-4325, we add sending out aggregation failure event to get rid of app 
leak in NM state store issues. However, we forget one case that log aggregation 
could abort rather than finish when NM get restart. In this case, we shouldn't 
send aggregation failure event.

> Log aggregation get failed when NM restart even with recovery
> -
>
> Key: YARN-6068
> URL: https://issues.apache.org/jira/browse/YARN-6068
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>
> The exception log is as following:
> {noformat}
> 2017-01-05 19:16:36,352 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:abortLogAggregation(527)) - Aborting log 
> aggregation for application_1483640789847_0001
> 2017-01-05 19:16:36,352 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:run(399)) - Aggregation did not complete for 
> application application_1483640789847_0001
> 2017-01-05 19:16:36,353 WARN  application.ApplicationImpl 
> (ApplicationImpl.java:handle(461)) - Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FAILED at RUNNING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:459)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:64)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1084)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1076)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-05 19:16:36,355 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(464)) - Application 
> application_1483640789847_0001 transitioned from RUNNING to null
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6068) Log aggregation get failed when NM restart even with recovery

2017-01-06 Thread Junping Du (JIRA)
Junping Du created YARN-6068:


 Summary: Log aggregation get failed when NM restart even with 
recovery
 Key: YARN-6068
 URL: https://issues.apache.org/jira/browse/YARN-6068
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical


The exception log is as following:
{noformat}
2017-01-05 19:16:36,352 INFO  logaggregation.AppLogAggregatorImpl 
(AppLogAggregatorImpl.java:abortLogAggregation(527)) - Aborting log aggregation 
for application_1483640789847_0001
2017-01-05 19:16:36,352 WARN  logaggregation.AppLogAggregatorImpl 
(AppLogAggregatorImpl.java:run(399)) - Aggregation did not complete for 
application application_1483640789847_0001
2017-01-05 19:16:36,353 WARN  application.ApplicationImpl 
(ApplicationImpl.java:handle(461)) - Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
APPLICATION_LOG_HANDLING_FAILED at RUNNING
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:459)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:64)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1084)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1076)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
2017-01-05 19:16:36,355 INFO  application.ApplicationImpl 
(ApplicationImpl.java:handle(464)) - Application application_1483640789847_0001 
transitioned from RUNNING to null
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5936) when cpu strict mode is closed, yarn couldn't assure scheduling fairness between containers

2017-01-06 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806611#comment-15806611
 ] 

Miklos Szegedi commented on YARN-5936:
--

In the latest test I used 100 threads per program, I just did not share the 
code. They run in parallel, so the sum of time command results measures, 
whether the whole set spent time in additional CPU cycles other than the 
activity loop. The reason I checked, is to ask whether you like a solution that 
uses {{cpu.cfs_quota_us}}.
I could imagine a dynamic cfs algorithm like the following.
A timer callback with a certain period could do:
{code}
if CPU is saturated
  for each container
if previous usage > fair share
  limit to fair share
else
  release all limits
{code}
It has drawbacks. It only works with saturated CPU, when not much time is spent 
waiting on I/O. It has a delay, since it works on historic data. This means 
also that it adds some utilization loss, which can be larger with multiple 
cores. On the other hand, it provides the requested fairness, when the CPU is 
saturated.
Does your node have multiple cores? The algorithm may not help much in that 
case. For example there are 8 cores. One container runs 8 threads, one 
container runs 2 threads. The fair share requested is 50%-50%. Without 
throttling the two containers will share 80%-20%. Even, if we set the fair 
share by throttling, when the cores are saturated, the usage will be 50%/25% 
when the quota is applied, so there is a utilization loss for a period. Now 
then, the algorithm may get more complicated...

> when cpu strict mode is closed, yarn couldn't assure scheduling fairness 
> between containers
> ---
>
> Key: YARN-5936
> URL: https://issues.apache.org/jira/browse/YARN-5936
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.1
>Reporter: zhengchenyu
>Priority: Critical
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> When using LinuxContainer, the setting that 
> "yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage" is 
> true could assure scheduling fairness with the cpu bandwith of cgroup. But 
> the cpu bandwidth of cgroup would lead to bad performance in our experience. 
> Without cpu bandwidth of cgroup, cpu.share of cgroup is our only way to 
> assure scheduling fairness, but it is not completely effective. For example, 
> There are two container that have same vcore(means same cpu.share), one 
> container is single-threaded, the other container is multi-thread. the 
> multi-thread will have more CPU time, It's unreasonable!
> Here is my test case, I submit two distributedshell application. And two 
> commmand are below:
> {code}
> hadoop jar 
> share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar 
> org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar 
> -shell_script ./run.sh  -shell_args 10 -num_containers 1 -container_memory 
> 1024 -container_vcores 1 -master_memory 1024 -master_vcores 1 -priority 10
> hadoop jar 
> share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar 
> org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar 
> -shell_script ./run.sh  -shell_args 1  -num_containers 1 -container_memory 
> 1024 -container_vcores 1 -master_memory 1024 -master_vcores 1 -priority 10
> {code}
>  here show the cpu time of the two container:
> {code}
>   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
> 15448 yarn  20   0 9059592  28336   9180 S 998.7  0.1  24:09.30 java
> 15026 yarn  20   0 9050340  27480   9188 S 100.0  0.1   3:33.97 java
> 13767 yarn  20   0 1799816 381208  18528 S   4.6  1.2   0:30.55 java
>77 root  rt   0   0  0  0 S   0.3  0.0   0:00.74 
> migration/1   
> {code}
> We find the cpu time of Muliti-Thread are ten times than the cpu time of 
> Single-Thread, though the two container have same cpu.share.
> notes:
> run.sh
> {code} 
>   java -cp /home/yarn/loop.jar:$CLASSPATH loop.loop $1
> {code} 
> loop.java
> {code} 
> package loop;
> public class loop {
>   public static void main(String[] args) {
>   // TODO Auto-generated method stub
>   int loop = 1;
>   if(args.length>=1) {
>   System.out.println(args[0]);
>   loop = Integer.parseInt(args[0]);
>   }
>   for(int i=0;i   System.out.println("start thread " + i);
>   new Thread(new Runnable() {
> 

[jira] [Commented] (YARN-5864) YARN Capacity Scheduler - Queue Priorities

2017-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806595#comment-15806595
 ] 

Hadoop QA commented on YARN-5864:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 16 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
45s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
0s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
26s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 15s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 156 new + 1648 unchanged - 20 fixed = 1804 total (was 1668) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  1m 
51s{color} | {color:red} hadoop-yarn-project_hadoop-yarn generated 3 new + 6465 
unchanged - 0 fixed = 6468 total (was 6465) {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
28s{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager
 generated 3 new + 913 unchanged - 0 fixed = 916 total (was 913) {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 24m 22s{color} 
| {color:red} hadoop-yarn in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 43m 31s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}124m  7s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens |
|   | hadoop.yarn.server.timeline.webapp.TestTimelineWebServices |
|   | 

[jira] [Commented] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec

2017-01-06 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806581#comment-15806581
 ] 

Allen Wittenauer commented on YARN-6060:


No, because now you just broke FreeBSD and likely other OSes.  (pkgsrc patches 
the broken Java code.  We should really fix that for them.)

> Linux container executor fails to run container on directories mounted as 
> noexec
> 
>
> Key: YARN-6060
> URL: https://issues.apache.org/jira/browse/YARN-6060
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-6060.000.patch, YARN-6060.001.patch
>
>
> If node manager directories are mounted as noexec, LCE fails with the 
> following error:
> Launching container...
> Couldn't execute the container launch file 
> /tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh
>  - Permission denied



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6066) Opportunistic containers minor fixes: API annotations and config parameter changes

2017-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806497#comment-15806497
 ] 

Hadoop QA commented on YARN-6066:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
49s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
48s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
27s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
35s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
42s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
24s{color} | {color:green} branch-2 passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
26s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
39s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
55s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
29s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
29s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
29s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 39s{color} | {color:orange} root: The patch generated 3 new + 835 unchanged 
- 5 fixed = 838 total (was 840) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
5s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
14s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
38s{color} | {color:green} hadoop-yarn-api in the patch passed with JDK 
v1.7.0_121. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 
54s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed 
with JDK v1.7.0_121. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
23s{color} | {color:green} hadoop-yarn-site in 

[jira] [Commented] (YARN-5556) Support for deleting queues without requiring a RM restart

2017-01-06 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806480#comment-15806480
 ] 

Naganarasimha G R commented on YARN-5556:
-

Thanks [~wangda] & [~xgong], 
bq. I think we don't need the additional DELETED state, first it generate some 
maintenance overheads, for example we need to maintain state transition to/from 
of the DELETED state. And since by design a queue can be deleted only if queue 
is stopped and no app running, so the impact of typo should be minimum. Our 
preference is simply remove queue from config.
This makes the things pretty clear and straight forward for users/admin, and 
clears all my queries. Will modify and upload the patch at the earliest.

> Support for deleting queues without requiring a RM restart
> --
>
> Key: YARN-5556
> URL: https://issues.apache.org/jira/browse/YARN-5556
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Xuan Gong
>Assignee: Naganarasimha G R
> Attachments: YARN-5556.v1.001.patch, YARN-5556.v1.002.patch, 
> YARN-5556.v1.003.patch, YARN-5556.v1.004.patch
>
>
> Today, we could add or modify queues without restarting the RM, via a CS 
> refresh. But for deleting queue, we have to restart the ResourceManager. We 
> could support for deleting queues without requiring a RM restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4466) ResourceManager should tolerate unexpected exceptions to happen in non-critical subsystem/services like SystemMetricsPublisher

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4466:
-
Target Version/s:   (was: 2.8.0)

> ResourceManager should tolerate unexpected exceptions to happen in 
> non-critical subsystem/services like SystemMetricsPublisher
> --
>
> Key: YARN-4466
> URL: https://issues.apache.org/jira/browse/YARN-4466
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Naganarasimha G R
>
> From my comment in 
> YARN-4452(https://issues.apache.org/jira/browse/YARN-4452?focusedCommentId=15059805=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15059805),
>  we should make RM more robust with ignore (but log) unexpected exception in 
> its non-critical subsystems/services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6050) AMs can't be scheduled on racks or nodes

2017-01-06 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806464#comment-15806464
 ] 

Robert Kanter commented on YARN-6050:
-

FYI: I'm going to be on vacation for a few weeks, but we can continue 
discussing this and I can continue working on this when I get back.  
Alternatively, if we're happy with the 005 patch, we can commit that while I'm 
away :)

> AMs can't be scheduled on racks or nodes
> 
>
> Key: YARN-6050
> URL: https://issues.apache.org/jira/browse/YARN-6050
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-6050.001.patch, YARN-6050.002.patch, 
> YARN-6050.003.patch, YARN-6050.004.patch, YARN-6050.005.patch
>
>
> Yarn itself supports rack/node aware scheduling for AMs; however, there 
> currently are two problems:
> # To specify hard or soft rack/node requests, you have to specify more than 
> one {{ResourceRequest}}.  For example, if you want to schedule an AM only on 
> "rackA", you have to create two {{ResourceRequest}}, like this:
> {code}
> ResourceRequest.newInstance(PRIORITY, ANY, CAPABILITY, NUM_CONTAINERS, false);
> ResourceRequest.newInstance(PRIORITY, "rackA", CAPABILITY, NUM_CONTAINERS, 
> true);
> {code}
> The problem is that the Yarn API doesn't actually allow you to specify more 
> than one {{ResourceRequest}} in the {{ApplicationSubmissionContext}}.  The 
> current behavior is to either build one from {{getResource}} or directly from 
> {{getAMContainerResourceRequest}}, depending on if 
> {{getAMContainerResourceRequest}} is null or not.  We'll need to add a third 
> method, say {{getAMContainerResourceRequests}}, which takes a list of 
> {{ResourceRequest}} so that clients can specify the multiple resource 
> requests.
> # There are some places where things are hardcoded to overwrite what the 
> client specifies.  These are pretty straightforward to fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4351) Tests in h.y.c.TestGetGroups get failed on trunk

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4351:
-
Target Version/s:   (was: 2.8.0)

> Tests in h.y.c.TestGetGroups get failed on trunk
> 
>
> Key: YARN-4351
> URL: https://issues.apache.org/jira/browse/YARN-4351
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Reporter: Junping Du
>
> From test report: 
> https://builds.apache.org/job/PreCommit-YARN-Build/9661/testReport/, we can 
> see there are several test failures for TestGetGroups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4160) Dynamic NM Resources Configuration file should be simplified.

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4160:
-
Target Version/s:   (was: 2.8.0)

> Dynamic NM Resources Configuration file should be simplified.
> -
>
> Key: YARN-4160
> URL: https://issues.apache.org/jira/browse/YARN-4160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
>
> In YARN-313, we provide CLI to refresh NMs' resources dynamically. The format 
> of dynamic-resources.xml is something like following:
> {noformat}
> 
>   
> yarn.resource.dynamic.node_id_1.vcores
> 16
>   
>   
> yarn.resource.dynamic.node_id_1.memory
> 1024
>   
> 
> {noformat}
> This looks too redundant from review comments of YARN-313. We should have a 
> better, concisely format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4430) registry security validation can fail when downgrading to insecure would work

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4430:
-
Target Version/s:   (was: 2.8.0)

> registry security validation can fail when downgrading to insecure would work
> -
>
> Key: YARN-4430
> URL: https://issues.apache.org/jira/browse/YARN-4430
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
> Environment: Kerberos, java  7 & 8
>Reporter: Steve Loughran
>
> The Registry ZK client code does a pre-emptive validation of registry 
> security settings, in the interest of being helpful when things fail.
> But: sometimes it fails fast (see stack trace in SLIDER-993) when it could 
> just warn and continue -because if the registry is only being read, skipping 
> SASL auth works



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1874) Cleanup: Move RMActiveServices out of ResourceManager into its own file

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1874:
-
Target Version/s:   (was: 2.8.0)

> Cleanup: Move RMActiveServices out of ResourceManager into its own file
> ---
>
> Key: YARN-1874
> URL: https://issues.apache.org/jira/browse/YARN-1874
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Karthik Kambatla
>Assignee: Tsuyoshi Ozawa
> Attachments: YARN-1874.1.patch, YARN-1874.2.patch, YARN-1874.3.patch, 
> YARN-1874.4.patch
>
>
> As [~vinodkv] noticed on YARN-1867, ResourceManager is hard to maintain. We 
> should move RMActiveServices out to make it more manageable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2532) Track pending resources at the application level

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2532:
-
Target Version/s:   (was: 2.8.0)

> Track pending resources at the application level 
> -
>
> Key: YARN-2532
> URL: https://issues.apache.org/jira/browse/YARN-2532
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.5.1
>Reporter: Karthik Kambatla
>
> SchedulerApplicationAttempt keeps track of current consumption of an app. It 
> would be nice to have a similar value tracked for pending requests. 
> The immediate uses I see are: (1) Showing this on the Web UI (YARN-2333) and 
> (2) updating demand in FS in an event-driven style (YARN-2353)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1948) Expose utility methods in Apps.java publically

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1948:
-
Target Version/s:   (was: 2.8.0)

> Expose utility methods in Apps.java publically
> --
>
> Key: YARN-1948
> URL: https://issues.apache.org/jira/browse/YARN-1948
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Affects Versions: 2.4.0
>Reporter: Sandy Ryza
>Assignee: nijel
>  Labels: newbie
> Attachments: YARN-1948-1.patch
>
>
> Apps.setEnvFromInputString and Apps.addToEnvironment are methods used by 
> MapReduce, Spark, and Tez that are currently marked private.  As these are 
> useful for any YARN app that wants to allow users to augment container 
> environments, it would be helpful to make them public.
> It may make sense to put them in a new class with a better name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1747) Better physical memory monitoring for containers

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1747:
-
Target Version/s:   (was: 2.8.0)

> Better physical memory monitoring for containers
> 
>
> Key: YARN-1747
> URL: https://issues.apache.org/jira/browse/YARN-1747
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>
> YARN currently uses RSS to compute the physical memory being used by a 
> container. This can lead to issues, as noticed in HDFS-5957.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2268:
-
Target Version/s:   (was: 2.8.0)

> Disallow formatting the RMStateStore when there is an RM running
> 
>
> Key: YARN-2268
> URL: https://issues.apache.org/jira/browse/YARN-2268
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-2268.patch
>
>
> YARN-2131 adds a way to format the RMStateStore. However, it can be a problem 
> if we format the store while an RM is actively using it. It would be nice to 
> fail the format if there is an RM running and using this store. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4616) Default RM retry interval (30s) is too long

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4616:
-
Target Version/s:   (was: 2.8.0)

> Default RM retry interval (30s) is too long
> ---
>
> Key: YARN-4616
> URL: https://issues.apache.org/jira/browse/YARN-4616
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
>
> I think the default 30s for the RM retry interval is too long.
> The default node-heartbeat-interval is only 1s 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2162) Fair Scheduler :ability to optionally configure minResources and maxResources in terms of percentage

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2162:
-
Target Version/s:   (was: 2.8.0)

> Fair Scheduler :ability to optionally configure minResources and maxResources 
> in terms of percentage
> 
>
> Key: YARN-2162
> URL: https://issues.apache.org/jira/browse/YARN-2162
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Ashwin Shankar
>  Labels: scheduler
>
> minResources and maxResources in fair scheduler configs are expressed in 
> terms of absolute numbers X mb, Y vcores. 
> As a result, when we expand or shrink our hadoop cluster, we need to 
> recalculate and change minResources/maxResources accordingly, which is pretty 
> inconvenient.
> We can circumvent this problem if we can optionally configure these 
> properties in terms of percentage of cluster capacity. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3806) Proposal of Generic Scheduling Framework for YARN

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3806:
-
Target Version/s:   (was: 2.8.0)

> Proposal of Generic Scheduling Framework for YARN
> -
>
> Key: YARN-3806
> URL: https://issues.apache.org/jira/browse/YARN-3806
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wei Shao
> Attachments: ProposalOfGenericSchedulingFrameworkForYARN-V1.05.pdf, 
> ProposalOfGenericSchedulingFrameworkForYARN-V1.06.pdf
>
>
> Currently, a typical YARN cluster runs many different kinds of applications: 
> production applications, ad hoc user applications, long running services and 
> so on. Different YARN scheduling policies may be suitable for different 
> applications. For example, capacity scheduling can manage production 
> applications well since application can get guaranteed resource share, fair 
> scheduling can manage ad hoc user applications well since it can enforce 
> fairness among users. However, current YARN scheduling framework doesn’t have 
> a mechanism for multiple scheduling policies work hierarchically in one 
> cluster.
> YARN-3306 talked about many issues of today’s YARN scheduling framework, and 
> proposed a per-queue policy driven framework. In detail, it supported 
> different scheduling policies for leaf queues. However, support of different 
> scheduling policies for upper level queues is not seriously considered yet. 
> A generic scheduling framework is proposed here to address these limitations. 
> It supports different policies (fair, capacity, fifo and so on) for any queue 
> consistently. The proposal tries to solve many other issues in current YARN 
> scheduling framework as well.
> Two new proposed scheduling policies YARN-3807 & YARN-3808 are based on 
> generic scheduling framework brought up in this proposal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2024:
-
Target Version/s: 2.9.0  (was: 2.8.0)

> IOException in AppLogAggregatorImpl does not give stacktrace and leaves 
> aggregated TFile in a bad state.
> 
>
> Key: YARN-2024
> URL: https://issues.apache.org/jira/browse/YARN-2024
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Eric Payne
>Assignee: Xuan Gong
>
> Multiple issues were encountered when AppLogAggregatorImpl encountered an 
> IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating 
> yarn-logs for an application that had very large (>150G each) error logs.
> - An IOException was encountered during the LogWriter#append call, and a 
> message was printed, but no stacktrace was provided. Message: "ERROR: 
> Couldn't upload logs for container_n_nnn_nn_nn. Skipping 
> this container."
> - After the IOExceptin, the TFile is in a bad state, so subsequent calls to 
> LogWriter#append fail with the following stacktrace:
> 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[LogAggregationService #17907,5,main] threw an Exception.
> java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE
> at 
> org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528)
> at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:262)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:164)
> ...
> - At this point, the yarn-logs cleaner still thinks the thread is 
> aggregating, so the huge yarn-logs never get cleaned up for that application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1878) Yarn standby RM taking long to transition to active

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1878:
-
Target Version/s:   (was: 2.8.0)

> Yarn standby RM taking long to transition to active
> ---
>
> Key: YARN-1878
> URL: https://issues.apache.org/jira/browse/YARN-1878
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.0
>Reporter: Arpit Gupta
>Assignee: Xuan Gong
> Attachments: YARN-1878.1.patch
>
>
> In our HA tests we are noticing that some times it can take upto 10s for the 
> standby RM to transition to active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3808) Proposal of Time Extended Fair Scheduling for YARN

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3808:
-
Target Version/s:   (was: 2.8.0)

> Proposal of Time Extended Fair Scheduling for YARN
> --
>
> Key: YARN-3808
> URL: https://issues.apache.org/jira/browse/YARN-3808
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, scheduler
>Reporter: Wei Shao
> Attachments: ProposalOfTimeExtendedFairSchedulingForYARN-V1.03.pdf
>
>
> This proposal talks about the issues of YARN fair scheduling policy, and 
> tries to solve them by YARN-3806 and the new scheduling policy called time 
> extended fair scheduling.
> Time extended fair scheduling policy is proposed to enforces fairness over 
> time among users. For example, if two users share the cluster weekly, each 
> user’s fair share is half of the cluster per week. At a particular week, if 
> the first user has used the whole cluster for first half of the week, then in 
> second half of the week, second user will always have priority to use cluster 
> resources since the first user has used up its time extended fair share of 
> the cluster already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3608) Apps submitted to MiniYarnCluster always stay in ACCEPTED state.

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3608:
-
Target Version/s:   (was: 2.8.0)

> Apps submitted to MiniYarnCluster always stay in ACCEPTED state.
> 
>
> Key: YARN-3608
> URL: https://issues.apache.org/jira/browse/YARN-3608
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Affects Versions: 2.6.0
>Reporter: Spandan Dutta
>Assignee: Tsuyoshi Ozawa
>
> So I adapted a test case to submit a yarn app to a MiniYarnCluster and wait 
> for it to reach running state. Turns out that the app gets stuck in 
> "ACCEPTED" state. 
> {noformat}
>  @Test
>   public void testGetAllQueues() throws IOException, YarnException, 
> InterruptedException {
> MiniYARNCluster cluster = new MiniYARNCluster("testMRAMTokens", 1, 1, 1);
> YarnClient rmClient = null;
> try {
>   cluster.init(new YarnConfiguration());
>   cluster.start();
>   final Configuration yarnConf = cluster.getConfig();
>   rmClient = YarnClient.createYarnClient();
>   rmClient.init(yarnConf);
>   rmClient.start();
>   YarnClientApplication newApp = rmClient.createApplication();
>   ApplicationId appId = 
> newApp.getNewApplicationResponse().getApplicationId();
>   // Create launch context for app master
>   ApplicationSubmissionContext appContext
>   = Records.newRecord(ApplicationSubmissionContext.class);
>   // set the application id
>   appContext.setApplicationId(appId);
>   // set the application name
>   appContext.setApplicationName("test");
>   // Set up the container launch context for the application master
>   ContainerLaunchContext amContainer
>   = Records.newRecord(ContainerLaunchContext.class);
>   appContext.setAMContainerSpec(amContainer);
>   appContext.setResource(Resource.newInstance(1024, 1));
>   // Submit the application to the applications manager
>   rmClient.submitApplication(appContext);
>   ApplicationReport applicationReport =
>   rmClient.getApplicationReport(appContext.getApplicationId());
>   int timeout = 10;
>   while(timeout > 0 && applicationReport.getYarnApplicationState() !=
>   YarnApplicationState.RUNNING) {
> Thread.sleep(5 * 1000);
> timeout--;
>   }
>   Assert.assertTrue(timeout != 0);
>   Assert.assertTrue(applicationReport.getYarnApplicationState()
>   == YarnApplicationState.RUNNING);
>   List queues = rmClient.getAllQueues();
>   Assert.assertNotNull(queues);
>   Assert.assertTrue(!queues.isEmpty());
>   QueueInfo queue = queues.get(0);
>   List queueApplications = queue.getApplications();
>   Assert.assertFalse(queueApplications.isEmpty());
> } catch (YarnException e) {
>   Assert.assertTrue(e.getMessage().contains("Failed to submit"));
> } finally {
>   if (rmClient != null) {
> rmClient.stop();
>   }
>   cluster.stop();
> }
>   }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3883) YarnClient.getApplicationReport() doesn't not give diagnostics for the FINISHED state applications some times

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3883:
-
Target Version/s:   (was: 2.8.0)

> YarnClient.getApplicationReport() doesn't not give diagnostics for the 
> FINISHED state applications some times 
> --
>
> Key: YARN-3883
> URL: https://issues.apache.org/jira/browse/YARN-3883
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Devaraj K
>Assignee: Brahma Reddy Battula
>
> YarnClient.getApplicationReport() doesn't not give diagnostics for the 
> FINISHED state applications some times 
> Below one is the report from the YarnClient.getApplicationReport(), It 
> doesn't show the diagnostics for the application which has FinalStatus as 
> FAILED and YarnApplicationState as FINISHED.
> {code:xml}
> 15/07/03 15:53:27 INFO yarn.Client:
>  client token: N/A
>  diagnostics: N/A
>  ApplicationMaster host: XX.XXX.XX.XX
>  ApplicationMaster RPC port: 0
>  queue: default
>  start time: 1435918986890
>  final status: FAILED
>  tracking URL: 
> http://stobdtserver2:8088/proxy/application_1435848120635_0015/
>  user: root
> {code}
> But we can see the Diagnostics information in the RM Web UI for the same 
> application.
> {code:xml}
> YarnApplicationState: FINISHED
> Queue:default
> FinalStatus Reported by AM:   FAILED
> Started:  Fri Jul 03 15:53:06 +0530 2015
> Elapsed:  20sec
> Tracking URL: History
> Log Aggregation StatusDISABLED
> Diagnostics:  User class threw exception: java.lang.NumberFormatException: 
> For input string: "xx"
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3440) ResourceUsage should be copy-on-write

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3440:
-
Target Version/s:   (was: 2.8.0)

> ResourceUsage should be copy-on-write
> -
>
> Key: YARN-3440
> URL: https://issues.apache.org/jira/browse/YARN-3440
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler, yarn
>Reporter: Wangda Tan
>Assignee: Li Lu
>
> In {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceUsage}}, 
> even if it is thread-safe, but Resource returned by getters could be updated 
> by another thread.
> All Resource objects in ResourceUsage should be copy-on-write, reader will 
> always get a non-changed Resource. And changes apply on Resource acquired by 
> caller will not affect original Resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5569) YARN logs out user when job is run when improperly configured

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-5569:
-
Target Version/s:   (was: 2.8.0)

> YARN logs out user when job is run when improperly configured
> -
>
> Key: YARN-5569
> URL: https://issues.apache.org/jira/browse/YARN-5569
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Andrew Wang
>
> This is something that Vinay and I both hit while testing a 2.7.3 RC:
> {quote}
> I didn't have SSH to localhost set up (new laptop), and when I tried to run 
> the Pi job, it'd exit my window manager session. I feel there must be a more 
> developer-friendly solution here.
> {quote}
> {quote}
> Faced same issues as Andrew wang, while running the WordCount job first time 
> in my new Ubuntu installation, without 'configuring the shuffle handler 
> properly'. Whole session got logged by closing all other applications open. 
> After configuring the shuffle handler properly, job was successful though.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3603) Application Attempts page confusing

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3603:
-
Target Version/s:   (was: 2.8.0)

> Application Attempts page confusing
> ---
>
> Key: YARN-3603
> URL: https://issues.apache.org/jira/browse/YARN-3603
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.8.0
>Reporter: Thomas Graves
>Assignee: Sunil G
> Attachments: 0001-YARN-3603.patch, 0002-YARN-3603.patch, 
> 0003-YARN-3603.patch, ahs1.png
>
>
> The application attempts page 
> (http://RM:8088/cluster/appattempt/appattempt_1431101480046_0003_01)
> is a bit confusing on what is going on.  I think the table of containers 
> there is for only Running containers and when the app is completed or killed 
> its empty.  The table should have a label on it stating so.  
> Also the "AM Container" field is a link when running but not when its killed. 
>  That might be confusing.
> There is no link to the logs in this page but there is in the app attempt 
> table when looking at http://
> rm:8088/cluster/app/application_1431101480046_0003



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3610) FairScheduler: Add steady-fair-shares to the REST API documentation

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3610:
-
Target Version/s:   (was: 2.8.0)

> FairScheduler: Add steady-fair-shares to the REST API documentation
> ---
>
> Key: YARN-3610
> URL: https://issues.apache.org/jira/browse/YARN-3610
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation, fairscheduler
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Ray Chiang
>
> YARN-1050 adds documentation for FairScheduler REST API, but is missing the 
> steady-fair-share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3442) Consider abstracting out user, app limits etc into some sort of a LimitPolicy

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3442:
-
Target Version/s:   (was: 2.8.0)

> Consider abstracting out user, app limits etc into some sort of a LimitPolicy
> -
>
> Key: YARN-3442
> URL: https://issues.apache.org/jira/browse/YARN-3442
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> Similar to the policies being added in YARN-3318 and YARN-3441 for leaf and 
> parent queues, we should consider extracting an abstraction for limits too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3441) Introduce the notion of policies for a parent queue

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3441:
-
Target Version/s:   (was: 2.8.0)

> Introduce the notion of policies for a parent queue
> ---
>
> Key: YARN-3441
> URL: https://issues.apache.org/jira/browse/YARN-3441
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> Similar to the policy being added in YARN-3318 for leaf-queues, we need to 
> extend this notion to parent-queue too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5570) NodeManager blocks kill attempts during startup if ResourceManager is down

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-5570:
-
Target Version/s:   (was: 2.8.0)

> NodeManager blocks kill attempts during startup if ResourceManager is down
> --
>
> Key: YARN-5570
> URL: https://issues.apache.org/jira/browse/YARN-5570
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.7.3
>Reporter: Andrew Wang
>
> Found while testing a 2.7.3 RC:
> {quote}
> * If you start the NodeManager and not the RM, the NM has a handler for 
> SIGTERM and SIGINT that blocked my Ctrl-C and kill attempts during startup. I 
> had to kill -9 it.
> {quote}
> [~ste...@apache.org] tells me that this is potentially addressed by YARN-679, 
> though that's a large patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-81) Make sure YARN declares correct set of dependencies

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-81:
---
Labels:   (was: BB2015-05-TBR)

> Make sure YARN declares correct set of dependencies
> ---
>
> Key: YARN-81
> URL: https://issues.apache.org/jira/browse/YARN-81
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.0.0-alpha
>Reporter: Tom White
>Assignee: Junping Du
> Attachments: YARN-81-v2.patch, YARN-81.patch
>
>
> This is the equivalent of HADOOP-8278 for YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-81) Make sure YARN declares correct set of dependencies

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-81:
---
Target Version/s:   (was: 2.8.0)

> Make sure YARN declares correct set of dependencies
> ---
>
> Key: YARN-81
> URL: https://issues.apache.org/jira/browse/YARN-81
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.0.0-alpha
>Reporter: Tom White
>Assignee: Junping Du
> Attachments: YARN-81-v2.patch, YARN-81.patch
>
>
> This is the equivalent of HADOOP-8278 for YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-432) Documentation for Log Aggregation and log retrieval.

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-432:

Target Version/s:   (was: 2.8.0)

> Documentation for Log Aggregation and log retrieval.
> 
>
> Key: YARN-432
> URL: https://issues.apache.org/jira/browse/YARN-432
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Mahadev konar
>Assignee: Vinod Kumar Vavilapalli
>
> Retrieving logs in 0.23 is very different from what 0.20.* does. This is a 
> very new feature which will require good documentation for users to get used 
> to it. Lets make sure we have some solid documentation for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4790) Per user blacklist node for user specific error for container launch failure.

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4790:
-
Target Version/s:   (was: 2.8.0)

> Per user blacklist node for user specific error for container launch failure.
> -
>
> Key: YARN-4790
> URL: https://issues.apache.org/jira/browse/YARN-4790
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Reporter: Junping Du
>Assignee: Junping Du
>
> There are some user specific error for container launch failure, like:
> when enabling LinuxContainerExecutor, but some node doesn't have such user 
> exists, so container launch should get failed with following information:
> {noformat}
> 2016-02-14 15:37:03,111 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1434045496283_0036_02 State change from LAUNCHED to FAILED 
> 2016-02-14 15:37:03,111 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application 
> application_1434045496283_0036 failed 2 times due to AM Container for 
> appattempt_1434045496283_0036_02 exited with exitCode: -1000 due to: 
> Application application_1434045496283_0036 initialization failed 
> (exitCode=255) with output: User jdu not found 
> {noformat}
> Obviously, this node is not suitable for launching container for this user's 
> other applications. We need a per user blacklist track mechanism rather than 
> per application now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4932) [Umbrella] YARN/MR test failures on Windows

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4932:
-
Target Version/s:   (was: 2.8.0)

> [Umbrella] YARN/MR test failures on Windows
> ---
>
> Key: YARN-4932
> URL: https://issues.apache.org/jira/browse/YARN-4932
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Junping Du
>
> We found several test failures related to Windows. Here is Umbrella jira to 
> track them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-941) RM Should have a way to update the tokens it has for a running application

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-941:

Target Version/s:   (was: 2.8.0)

> RM Should have a way to update the tokens it has for a running application
> --
>
> Key: YARN-941
> URL: https://issues.apache.org/jira/browse/YARN-941
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Robert Joseph Evans
>Assignee: Xuan Gong
> Attachments: YARN-941.preview.2.patch, YARN-941.preview.3.patch, 
> YARN-941.preview.4.patch, YARN-941.preview.patch
>
>
> When an application is submitted to the RM it includes with it a set of 
> tokens that the RM will renew on behalf of the application, that will be 
> passed to the AM when the application is launched, and will be used when 
> launching the application to access HDFS to download files on behalf of the 
> application.
> For long lived applications/services these tokens can expire, and then the 
> tokens that the AM has will be invalid, and the tokens that the RM had will 
> also not work to launch a new AM.
> We need to provide an API that will allow the RM to replace the current 
> tokens for this application with a new set.  To avoid any real race issues, I 
> think this API should be something that the AM calls, so that the client can 
> connect to the AM with a new set of tokens it got using kerberos, then the AM 
> can inform the RM of the new set of tokens and quickly update its tokens 
> internally to use these new ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-959) Add ApplicationAttemptId into container environment

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-959:

Target Version/s:   (was: 2.8.0)

> Add ApplicationAttemptId into container environment
> ---
>
> Key: YARN-959
> URL: https://issues.apache.org/jira/browse/YARN-959
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.1.0-beta, 2.6.0
>Reporter: Bikas Saha
>Assignee: Junping Du
>
> Currently, AM's and containers have to read the container id and then derive 
> the application attempt id from it. While its convenient for YARN to tightly 
> couple container id with application attempt id, it is something that should 
> not be exposed to users. Its easy to let users know about their application 
> attempt id via env instead of requiring them to pull it out of the container 
> id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-882) Specify per user quota for private/application cache and user log files

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-882:

Target Version/s:   (was: 2.8.0)

> Specify per user quota for private/application cache and user log files
> ---
>
> Key: YARN-882
> URL: https://issues.apache.org/jira/browse/YARN-882
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
>
> At present there is no limit on the number of files / size of the files 
> localized by single user. Similarly there is no limit on the size of the log 
> files created by user via running containers.
> We need to restrict the user for this.
> For LocalizedResources; this has serious concerns in case of secured 
> environment where malicious user can start one container and localize 
> resources whose total size >= DEFAULT_NM_LOCALIZER_CACHE_TARGET_SIZE_MB. 
> Thereafter it will either fail (if no extra space is present on disk) or 
> deletion service will keep removing localized files for other 
> containers/applications. 
> The limit for logs/localized resources should be decided by RM and sent to NM 
> via secured containerToken. All these configurations should per container 
> instead of per user or per nm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1551) Allow user-specified reason for killApplication

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1551:
-
Target Version/s:   (was: 2.8.0)

> Allow user-specified reason for killApplication
> ---
>
> Key: YARN-1551
> URL: https://issues.apache.org/jira/browse/YARN-1551
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: YARN-1551.v01.patch, YARN-1551.v02.patch, 
> YARN-1551.v03.patch, YARN-1551.v04.patch, YARN-1551.v05.patch, 
> YARN-1551.v06.patch, YARN-1551.v06.patch
>
>
> This completes MAPREDUCE-5648



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1542) Add unit test for public resource on viewfs

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1542:
-
Target Version/s:   (was: 2.8.0)

> Add unit test for public resource on viewfs
> ---
>
> Key: YARN-1542
> URL: https://issues.apache.org/jira/browse/YARN-1542
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
> Attachments: YARN-1542.v01.patch, YARN-1542.v02.patch, 
> YARN-1542.v03.patch, YARN-1542.v04.patch, YARN-1542.v05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1052) Enforce submit application queue ACLs outside the scheduler

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1052:
-
Target Version/s:   (was: 2.8.0)

> Enforce submit application queue ACLs outside the scheduler
> ---
>
> Key: YARN-1052
> URL: https://issues.apache.org/jira/browse/YARN-1052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Assignee: Xuan Gong
>
> Per discussion in YARN-899, schedulers should not need to enforce queue ACLs 
> on their own.  Currently schedulers do this for application submission, and 
> this should be done in the RM code instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1539) Queue admin ACLs should NOT be similar to submit-acls w.r.t hierarchy.

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1539:
-
Target Version/s:   (was: 2.8.0)

> Queue admin ACLs should NOT be similar to submit-acls w.r.t hierarchy.
> --
>
> Key: YARN-1539
> URL: https://issues.apache.org/jira/browse/YARN-1539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>
> Today, Queue admin ACLs are similar to submit-acls w.r.t hierarchy in that if 
> one has to be able to administer a queue, he/she should be an admin of all 
> the queues in the ancestry - an unnecessary burden.
> This was added in YARN-899 and I believe is wrong semantics as well as 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1225) FinishApplicationMasterRequest should also have a final IPC/RPC address.

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1225:
-
Target Version/s:   (was: 2.8.0)

> FinishApplicationMasterRequest should also have a final IPC/RPC address.
> 
>
> Key: YARN-1225
> URL: https://issues.apache.org/jira/browse/YARN-1225
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Junping Du
> Attachments: YARN-1225-kickOffTestDS.patch, YARN-1225-v1.patch, 
> YARN-1225-v2.patch, YARN-1225-v3.patch, YARN-1225-v4.1.patch, 
> YARN-1225-v4.patch
>
>
> AMs already can report final Http URL via FinishApplicationMasterRequest, but 
> there is no field to report an IPC/RPC address.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1334) YARN should give more info on errors when running failed distributed shell command

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1334:
-
Target Version/s:   (was: 2.8.0)

> YARN should give more info on errors when running failed distributed shell 
> command
> --
>
> Key: YARN-1334
> URL: https://issues.apache.org/jira/browse/YARN-1334
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications/distributed-shell
>Affects Versions: 2.3.0
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
> Attachments: YARN-1334.1.patch
>
>
> Run incorrect command such as:
> /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
> -jar  -shell_command ./test1.sh -shell_script ./
> would show shell exit code exception with no useful message. It should print 
> out sysout/syserr of containers/AM of why it is failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1334) YARN should give more info on errors when running failed distributed shell command

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1334:
-
Labels:   (was: BB2015-05-TBR)

> YARN should give more info on errors when running failed distributed shell 
> command
> --
>
> Key: YARN-1334
> URL: https://issues.apache.org/jira/browse/YARN-1334
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications/distributed-shell
>Affects Versions: 2.3.0
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
> Attachments: YARN-1334.1.patch
>
>
> Run incorrect command such as:
> /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
> -jar  -shell_command ./test1.sh -shell_script ./
> would show shell exit code exception with no useful message. It should print 
> out sysout/syserr of containers/AM of why it is failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2614) Cleanup synchronized method in SchedulerApplicationAttempt

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2614:
-
Target Version/s:   (was: 2.8.0)

> Cleanup synchronized method in SchedulerApplicationAttempt
> --
>
> Key: YARN-2614
> URL: https://issues.apache.org/jira/browse/YARN-2614
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Wangda Tan
>
> According to discussions in YARN-2594, there're some methods in 
> SchedulerApplicationAttempt will be accessed by other modules, that will lead 
> to potential dead lock in RM, we should cleanup them as much as we can.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1272) Add a link to cluster/application page on node manager's list of application page

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1272:
-
Target Version/s:   (was: 2.8.0)

> Add a link to cluster/application page on node manager's list of application 
> page
> -
>
> Key: YARN-1272
> URL: https://issues.apache.org/jira/browse/YARN-1272
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.0.5-alpha
>Reporter: Paul Han
>Assignee: Paul Han
> Attachments: NMApplicationPage.png, RMApplicationPage.png, 
> YARN-1272.patch, YARN-1272.patch
>
>
> On node manager's application/application page, the content is significant 
> less than the content on resource managers's application page 
> /cluster/application.
> Adding a link from nodemanager's application page to resourcemanager's 
> application page will help user get info faster and more efficient.
> Please see the screenshot for the benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec

2017-01-06 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806445#comment-15806445
 ] 

Miklos Szegedi commented on YARN-6060:
--

Does it help, if we use {{/bin/bash}}?

> Linux container executor fails to run container on directories mounted as 
> noexec
> 
>
> Key: YARN-6060
> URL: https://issues.apache.org/jira/browse/YARN-6060
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-6060.000.patch, YARN-6060.001.patch
>
>
> If node manager directories are mounted as noexec, LCE fails with the 
> following error:
> Launching container...
> Couldn't execute the container launch file 
> /tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh
>  - Permission denied



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3994) RM should respect AM resource/placement constraints

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3994:
-
Target Version/s:   (was: 2.8.0)

> RM should respect AM resource/placement constraints
> ---
>
> Key: YARN-3994
> URL: https://issues.apache.org/jira/browse/YARN-3994
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
>
> Today, locality and cpu for the AM can be specified in the AM launch 
> container request but are ignored at the RM. Locality is assumed to be ANY 
> and cpu is dropped. There may be other things too that are ignored. This 
> should be fixed so that the user gets what is specified in their code to 
> launch the AM. cc [~leftnoteasy] [~vvasudev] [~adhoot]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3991) Investigate if we need an atomic way to set both memory and CPU on Resource

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3991:
-
Target Version/s:   (was: 2.8.0)

> Investigate if we need an atomic way to set both memory and CPU on Resource
> ---
>
> Key: YARN-3991
> URL: https://issues.apache.org/jira/browse/YARN-3991
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Karthik Kambatla
>Assignee: Varun Saxena
>  Labels: capacityscheduler, fairscheduler, scheduler
>
> While reviewing another patch, noticed that we have independent methods to 
> set memory and CPU. 
> Do we need another method to set them both atomically? Otherwise, would two 
> threads trying to set both values lose any updates? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2835) YARN WebApps should be a public API

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2835:
-
Target Version/s:   (was: 2.8.0)

> YARN WebApps should be a public API
> ---
>
> Key: YARN-2835
> URL: https://issues.apache.org/jira/browse/YARN-2835
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>
> For application masters that need to host webservices and/or a UI, this is 
> common functionality that could be re-used across the ecosystem.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2814) RM: Clean-up the handling of "fatal" events

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2814:
-
Target Version/s:   (was: 2.8.0)

> RM: Clean-up the handling of "fatal" events
> ---
>
> Key: YARN-2814
> URL: https://issues.apache.org/jira/browse/YARN-2814
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-2814-0.patch
>
>
> YARN-2579 fixes a critical issue around handling fatal events in the RM, but 
> does so minimally. This JIRA is to follow through that approach and do more 
> clean-up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2963) Helper library that allows requesting containers from multiple queues

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2963:
-
Target Version/s:   (was: 2.8.0)

> Helper library that allows requesting containers from multiple queues
> -
>
> Key: YARN-2963
> URL: https://issues.apache.org/jira/browse/YARN-2963
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: client
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-2963-preview.patch
>
>
> As proposed on the mailing list (yarn-dev), it would be nice to have a way 
> for YARN applications to request containers from multiple queues. 
> e.g. Oozie might want to run a single AM for all user jobs and request one 
> container per launcher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2670) Adding feedback capability to capacity scheduler from external systems

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2670:
-
Target Version/s:   (was: 2.8.0)

> Adding feedback capability to capacity scheduler from external systems
> --
>
> Key: YARN-2670
> URL: https://issues.apache.org/jira/browse/YARN-2670
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
>
> The sheer growth in data volume and Hadoop cluster size make it a significant 
> challenge to diagnose and locate problems in a production-level cluster 
> environment efficiently and within a short period of time. Often times, the 
> distributed monitoring systems are not capable of detecting a problem well in 
> advance when a large-scale Hadoop cluster starts to deteriorate in 
> performance or becomes unavailable. Thus, incoming workloads, scheduled 
> between the time when cluster starts to deteriorate and the time when the 
> problem is identified, suffer from longer execution times. As a result, both 
> reliability and throughput of the cluster reduce significantly. we address 
> this problem by proposing a system called Astro, which consists of a 
> predictive model and an extension to the Capacity scheduler. The predictive 
> model in Astro takes into account a rich set of cluster behavioral 
> information that are collected by monitoring processes and model them using 
> machine learning algorithms to predict future behavior of the cluster. The 
> Astro predictive model detects anomalies in the cluster and also identifies a 
> ranked set of metrics that have contributed the most towards the problem. The 
> Astro scheduler uses the prediction outcome and the list of metrics to decide 
> whether it needs to move and reduce workloads from the problematic cluster 
> nodes or to prevent additional workload allocations to them, in order to 
> improve both throughput and reliability of the cluster.
> This JIRA is only for adding feedback capabilities to Capacity Scheduler 
> which can take feedback from external systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3683) Create an abstraction in ContainerExecutor for Container-script generation

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3683:
-
Target Version/s:   (was: 2.8.0)

> Create an abstraction in ContainerExecutor for Container-script generation
> --
>
> Key: YARN-3683
> URL: https://issues.apache.org/jira/browse/YARN-3683
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> Before YARN-1964, Container-script generation was fundamentally driven by 
> ContainerLaunch object. After YARN-1964, this got pulled into 
> ContainerExecutor via the {{writeLaunchEnv()}} method which looks like an 
> API, but isn't.
> In addition, DefaultContainerExecutor itself has a plugin 
> {{LocalWrapperScriptBuilder}} which kind of does the same thing, but only for 
> Linux/Windows.
> We need to have a common API to override the script generation for 
> Linux/Windows/Docker etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3720) Need comprehensive documentation for configuration CPU/memory resources on NodeManager

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3720:
-
Target Version/s:   (was: 2.8.0)

> Need comprehensive documentation for configuration CPU/memory resources on 
> NodeManager
> --
>
> Key: YARN-3720
> URL: https://issues.apache.org/jira/browse/YARN-3720
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: documentation, nodemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Varun Vasudev
>
> Things are getting more and more complex after the likes of YARN-160. We need 
> a document explaining how to configure cpu/memory values on a NodeManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3555) Incorrect format for the container log url returned by ContainerReport.

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3555:
-
Target Version/s:   (was: 2.8.0)

> Incorrect format for the container log url returned by ContainerReport.
> ---
>
> Key: YARN-3555
> URL: https://issues.apache.org/jira/browse/YARN-3555
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Reporter: Spandan Dutta
>Assignee: Spandan Dutta
>
> Assume that we have a ContainerReport obj cr.
> When we do cr.getLogUrl() to get the log url for the container, the url 
> returned is of the form //. This looks incorrect as it should be 
> prepended with the correct protocol. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2792) Have a public Test-only API for creating important records that ecosystem projects can depend on

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2792:
-
Target Version/s:   (was: 2.8.0)

> Have a public Test-only API for creating important records that ecosystem 
> projects can depend on
> 
>
> Key: YARN-2792
> URL: https://issues.apache.org/jira/browse/YARN-2792
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>
> From YARN-2789,
> {quote}
> Sigh.
> Even though this is a private API, it will be used by downstream projects 
> for testing. It'll be useful for this to be re-instated, maybe with a 
> deprecated annotation, so that older versions of downstream projects can 
> build against Hadoop 2.6.
> I am inclined to have a separate test-only public util API that keeps 
> compatibility for tests. Rather than opening unwanted APIs up. I'll file a 
> separate ticket for this, we need all YARN apps/frameworks to move to that 
> API instead of these private unstable APIs.
> For now, I am okay keeping a private compat for the APIs changed in YARN-2698.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3746) NotFoundException(404) will java.lang.IllegalStateException: STREAM when accepting XML as the content

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3746:
-
Target Version/s:   (was: 2.8.0)

> NotFoundException(404) will java.lang.IllegalStateException: STREAM when 
> accepting XML as the content
> -
>
> Key: YARN-3746
> URL: https://issues.apache.org/jira/browse/YARN-3746
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Both RM and ATS REST API are affected. And the weird thing is that it only 
> happens with 404, but not other error code, and it only happens with xml, but 
> not json.
> {code}
> zshens-mbp:Deployment zshen$ curl -H "Accept: application/xml" -H 
> "Content-Type:application/xml" 
> http://localhost:8188/ws/v1/applicationhistory/apps/application_1432863609211_0001
> 
> 
> 
> Error 500 STREAM
> 
> HTTP ERROR 500
> Problem accessing 
> /ws/v1/applicationhistory/apps/application_1432863609211_0001. Reason:
> STREAMCaused 
> by:java.lang.IllegalStateException: STREAM
>   at org.mortbay.jetty.Response.getWriter(Response.java:616)
>   at org.apache.hadoop.yarn.webapp.View.writer(View.java:141)
>   at org.apache.hadoop.yarn.webapp.view.TextView.writer(TextView.java:39)
>   at 
> org.apache.hadoop.yarn.webapp.view.TextView.echoWithoutEscapeHtml(TextView.java:60)
>   at 
> org.apache.hadoop.yarn.webapp.view.TextView.putWithoutEscapeHtml(TextView.java:80)
>   at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:81)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.render(Dispatcher.java:197)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:145)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:602)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:277)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:554)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1211)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>   at 
> 

[jira] [Updated] (YARN-2771) DistributedShell's DSConstants are badly named

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2771:
-
Target Version/s:   (was: 2.8.0)

> DistributedShell's DSConstants are badly named
> --
>
> Key: YARN-2771
> URL: https://issues.apache.org/jira/browse/YARN-2771
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: YARN-2771.1.patch, YARN-2771.2.patch, YARN-2771.3.patch
>
>
> I'd rather have underscores (DISTRIBUTED_SHELL_TIMELINE_DOMAIN instead of 
> DISTRIBUTEDSHELLTIMELINEDOMAIN).
> DISTRIBUTEDSHELLTIMELINEDOMAIN is added in this release, can we rename it to 
> be DISTRIBUTED_SHELL_TIMELINE_DOMAIN?
> For the old envs, we can just add new envs that point to the old-one and 
> deprecate the old ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2772) DistributedShell's timeline related options are not clear

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2772:
-
Target Version/s:   (was: 2.8.0)

> DistributedShell's timeline related options are not clear
> -
>
> Key: YARN-2772
> URL: https://issues.apache.org/jira/browse/YARN-2772
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Zhijie Shen
> Attachments: YARN-2772.1.patch, YARN-2772.2.patch
>
>
> The new options "domain" and "create" options - they are not descriptive at 
> all. It is also not clear when view_acls and modify_acls need to be set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2892) Unable to get AMRMToken in unmanaged AM when using a secure cluster

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2892:
-
Target Version/s:   (was: 2.8.0)

> Unable to get AMRMToken in unmanaged AM when using a secure cluster
> ---
>
> Key: YARN-2892
> URL: https://issues.apache.org/jira/browse/YARN-2892
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/unmanaged-AM-launcher, resourcemanager
>Reporter: Sevada Abraamyan
>Assignee: Sevada Abraamyan
> Attachments: YARN-2892.patch, YARN-2892.patch, YARN-2892.patch
>
>
> An AMRMToken is retrieved from the ApplicationReport by the YarnClient. 
> When the RM creates the ApplicationReport and sends it back to the client it 
> makes a simple security check whether it should include the AMRMToken in the 
> report (See createAndGetApplicationReport in RMAppImpl).This security check 
> verifies that the user who submitted the original application is the same 
> user who is requesting the ApplicationReport. If they are indeed the same 
> user then it includes the AMRMToken, otherwise it does not include it.
> The problem arises from the fact that when an application is submitted, the 
> RM  saves the short username of the user who created the application (See 
> submitApplication in ClientRmService). Afterwards when the ApplicationReport 
> is requested, the system tries to match the full username of the requester 
> against the previously stored short username. 
> In a secure cluster using Kerberos this check fails because the principle is 
> stripped from the username when we request a short username. So for example 
> the short username might be "Foo" whereas the full username is 
> "f...@company.com"
> Note: A very similar problem has been previously reported 
> ([Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3181) FairScheduler: Fix up outdated findbugs issues

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3181:
-
Target Version/s:   (was: 2.8.0)

> FairScheduler: Fix up outdated findbugs issues
> --
>
> Key: YARN-3181
> URL: https://issues.apache.org/jira/browse/YARN-3181
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Brahma Reddy Battula
> Attachments: YARN-3181-002.patch, yarn-3181-1.patch
>
>
> In FairScheduler, we have excluded some findbugs-reported errors. Some of 
> them aren't applicable anymore, and there are a few that can be easily fixed 
> without needing an exclusion. It would be nice to fix them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3073) FS rack-local requests wait for node-locality delay even in the absence of node-local requests

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3073:
-
Target Version/s:   (was: 2.8.0)

> FS rack-local requests wait for node-locality delay even in the absence of 
> node-local requests
> --
>
> Key: YARN-3073
> URL: https://issues.apache.org/jira/browse/YARN-3073
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>
> YARN-2990 fixes the locality delay issue in FS for off-switch requests. This 
> JIRA is to handle rack-local requests. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3274) FIFOScheduler should assign the same container number up to ResourceRequest ask for ANY

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3274:
-
Target Version/s:   (was: 2.8.0)

> FIFOScheduler should assign the same container number up to ResourceRequest 
> ask for ANY
> ---
>
> Key: YARN-3274
> URL: https://issues.apache.org/jira/browse/YARN-3274
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Varun Saxena
>
> Per discussion in MAPREDUCE-5583, FifoScheduler should assign containers that 
> up to the number of containers specified in ResourceRequest for ANY which is 
> not true today. It only check number of ResourceRequest for ANY >0, then go 
> ahead to allocate containers according to other locality request which could 
> exceed the number in ANY. 
> This behavior is not consistent with other Schedulers, like: CS or FS, and 
> not meet our expectation, so we should fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1813) Better error message for "yarn logs" when permission denied

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1813:
-
Target Version/s:   (was: 2.8.0)

> Better error message for "yarn logs" when permission denied
> ---
>
> Key: YARN-1813
> URL: https://issues.apache.org/jira/browse/YARN-1813
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.3.0, 2.4.1, 2.5.1
>Reporter: Andrew Wang
>Assignee: Tsuyoshi Ozawa
>Priority: Minor
> Attachments: YARN-1813.1.patch, YARN-1813.2.patch, YARN-1813.2.patch, 
> YARN-1813.3.patch, YARN-1813.4.patch, YARN-1813.5.patch, YARN-1813.6.patch
>
>
> I ran some MR jobs as the "hdfs" user, and then forgot to sudo -u when 
> grabbing the logs. "yarn logs" prints an error message like the following:
> {noformat}
> [andrew.wang@a2402 ~]$ yarn logs -applicationId application_1394482121761_0010
> 14/03/10 16:05:10 INFO client.RMProxy: Connecting to ResourceManager at 
> a2402.halxg.cloudera.com/10.20.212.10:8032
> Logs not available at 
> /tmp/logs/andrew.wang/logs/application_1394482121761_0010
> Log aggregation has not completed or is not enabled.
> {noformat}
> It'd be nicer if it said "Permission denied" or "AccessControlException" or 
> something like that instead, since that's the real issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3332:
-
Target Version/s:   (was: 2.8.0)

> [Umbrella] Unified Resource Statistics Collection per node
> --
>
> Key: YARN-3332
> URL: https://issues.apache.org/jira/browse/YARN-3332
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: Design - UnifiedResourceStatisticsCollection.pdf
>
>
> Today in YARN, NodeManager collects statistics like per container resource 
> usage and overall physical resources available on the machine. Currently this 
> is used internally in YARN by the NodeManager for only a limited usage: 
> automatically determining the capacity of resources on node and enforcing 
> memory usage to what is reserved per container.
> This proposal is to extend the existing architecture and collect statistics 
> for usage b​eyond​ the existing use­cases.
> Proposal attached in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2127) Move YarnUncaughtExceptionHandler into Hadoop common

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2127:
-
Target Version/s:   (was: 2.8.0)

> Move YarnUncaughtExceptionHandler into Hadoop common
> 
>
> Key: YARN-2127
> URL: https://issues.apache.org/jira/browse/YARN-2127
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Create a superclass of {{YarnUncaughtExceptionHandler}}  in the hadoop-common 
> code (retaining the original for compatibility).
> This would be available for any hadoop application to use, and the YARN-679 
> launcher could automatically set up the handler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5173) Enable Token Support for AHSClient

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-5173:
-
Target Version/s:   (was: 2.8.0)

> Enable Token Support for AHSClient
> --
>
> Key: YARN-5173
> URL: https://issues.apache.org/jira/browse/YARN-5173
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>
> In a scenario where the YarnClient can't find an application during the 
> getApplicationReport method call, it falls over to the AHS method(s) via RPC, 
> which throws an AccessControlException as token support for AHS is disabled.
> {code}
> java.io.IOException: Failed on local exception: java.io.IOException: 
> org.apache.hadoop.security.AccessControlException: Client cannot authenticate 
> via:[TYPE_OF_AUTH]; Host Details : local host is: "1.2.3.4"; destination host 
> is: "ahs-address:ahs-port;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-535) TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during write phase, breaks later test runs

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-535:

Target Version/s:   (was: 2.8.0)

> TestUnmanagedAMLauncher can corrupt target/test-classes/yarn-site.xml during 
> write phase, breaks later test runs
> 
>
> Key: YARN-535
> URL: https://issues.apache.org/jira/browse/YARN-535
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Affects Versions: 2.6.0
> Environment: OS/X laptop, HFS+ filesystem
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: YARN-535-02.patch, YARN-535.patch
>
>
> the setup phase of {{TestUnmanagedAMLauncher}} overwrites {{yarn-site.xml}}. 
> As {{Configuration.writeXml()}} does a reread of all resources, this will 
> break if the (open-for-writing) resource is already visible as an empty file. 
> This leaves a corrupted {{target/test-classes/yarn-site.xml}}, which breaks 
> later test runs -because it is not overwritten by later incremental builds, 
> due to timestamps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1843) LinuxContainerExecutor should always log output

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1843:
-
Target Version/s:   (was: 2.8.0)

> LinuxContainerExecutor should always log output
> ---
>
> Key: YARN-1843
> URL: https://issues.apache.org/jira/browse/YARN-1843
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Liyin Liang
>Assignee: Liyin Liang
>Priority: Trivial
> Attachments: YARN-1843-1.diff, YARN-1843-2.diff, YARN-1843.diff
>
>
> If debug is enable, LinuxContainerExecutor should aloways log output after 
> shExec.execute().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4212) FairScheduler: Parent queues is not allowed to be 'Fair' policy if its children have the "drf" policy

2017-01-06 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-4212:
---
Attachment: YARN-4212.005.patch

> FairScheduler: Parent queues is not allowed to be 'Fair' policy if its 
> children have the "drf" policy
> -
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Yufei Gu
>  Labels: fairscheduler
> Attachments: YARN-4212.002.patch, YARN-4212.003.patch, 
> YARN-4212.004.patch, YARN-4212.005.patch, YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec

2017-01-06 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806409#comment-15806409
 ] 

Allen Wittenauer commented on YARN-6060:


This is basically the point that [~templedf] was making: if yarn's PATH 
contains a place where files can be written, it's very easy to get root (since 
c-e will inherit it) to execute any program called 'bash'.

> Linux container executor fails to run container on directories mounted as 
> noexec
> 
>
> Key: YARN-6060
> URL: https://issues.apache.org/jira/browse/YARN-6060
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-6060.000.patch, YARN-6060.001.patch
>
>
> If node manager directories are mounted as noexec, LCE fails with the 
> following error:
> Launching container...
> Couldn't execute the container launch file 
> /tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh
>  - Permission denied



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4148) When killing app, RM releases app's resource before they are released by NM

2017-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806370#comment-15806370
 ] 

Hadoop QA commented on YARN-4148:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 23s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 220 unchanged - 2 fixed = 221 total (was 222) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 41s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 63m 53s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-4148 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12842873/YARN-4148.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 5a5e206acb7a 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 71a4acf |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/14594/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/14594/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/14594/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| 

[jira] [Commented] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec

2017-01-06 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806353#comment-15806353
 ] 

Miklos Szegedi commented on YARN-6060:
--

Thank you for the comment, [~aw]. As I said, I agree that noexec should not be 
set on node manager directories. That does not mean that, if set Yarn should 
completely fail and not run any job. Can you be more specific on this one "huge 
hole on misconfigured systems"?

> Linux container executor fails to run container on directories mounted as 
> noexec
> 
>
> Key: YARN-6060
> URL: https://issues.apache.org/jira/browse/YARN-6060
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-6060.000.patch, YARN-6060.001.patch
>
>
> If node manager directories are mounted as noexec, LCE fails with the 
> following error:
> Launching container...
> Couldn't execute the container launch file 
> /tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh
>  - Permission denied



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6057) yarn.scheduler.minimum-allocation-* and yarn.scheduler.maximum-allocation-* descriptions are incorrect about behavior when a request is out of bounds

2017-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806304#comment-15806304
 ] 

Hadoop QA commented on YARN-6057:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
36s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 23m 47s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6057 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12846104/YARN-6057.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  xml  |
| uname | Linux 59d94072e986 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 
15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 71a4acf |
| Default Java | 1.8.0_111 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/14595/testReport/ |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/14595/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> yarn.scheduler.minimum-allocation-* and yarn.scheduler.maximum-allocation-* 
> descriptions are incorrect about behavior when a request is out of bounds
> -
>
> Key: YARN-6057
> URL: https://issues.apache.org/jira/browse/YARN-6057
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Julia Sommer
>Priority: Minor
> Attachments: YARN-6057.001.patch
>
>
> {code}
>   
> The minimum allocation for every container request at the RM,
> in terms of virtual CPU cores. Requests lower than this will throw a
> InvalidResourceRequestException.
> yarn.scheduler.minimum-allocation-vcores
> 1
>   
> {code}
> 

[jira] [Comment Edited] (YARN-5258) Document Use of Docker with LinuxContainerExecutor

2017-01-06 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806246#comment-15806246
 ] 

Sidharta Seethana edited comment on YARN-5258 at 1/7/17 12:32 AM:
--

[~templedf], the doc looks good. There is one thing, though : 

{quote}
the entry point will be ignored when LCE launches the image because the
LCE always specifies the command to execute as YARN's container launch script.
{quote}

Actually, images with ‘entrypoint’s won’t work correctly - the entry point is 
not ignored, the launch command (in this case launch_container.sh) will be sent 
as an argument to the entrypoint which is completely unexpected. The only 
scenario where this could work is if 
YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE is in use - where the entry 
point is used with the default command. We’ll need to find a fix/work around 
for this for other scenarios. 


was (Author: sidharta-s):
[~templedf], the doc looks good. The is one thing, though : 

{quote}
the entry point will be ignored when LCE launches the image because the
LCE always specifies the command to execute as YARN's container launch script.
{quote}

Actually, images with ‘entrypoint’s won’t work correctly - the entry point is 
not ignored, the launch command (in this case launch_container.sh) will be sent 
as an argument to the entrypoint which is completely unexpected. The only 
scenario where this could work is if 
YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE is in use - where the entry 
point is used with the default command. We’ll need to find a fix/work around 
for this for other scenarios. 

> Document Use of Docker with LinuxContainerExecutor
> --
>
> Key: YARN-5258
> URL: https://issues.apache.org/jira/browse/YARN-5258
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-easy
> Attachments: YARN-5258.001.patch, YARN-5258.002.patch, 
> YARN-5258.003.patch, YARN-5258.004.patch
>
>
> There aren't currently any docs that explain how to configure Docker and all 
> of its various options aside from reading all of the JIRAs.  We need to 
> document the configuration, use, and troubleshooting, along with helpful 
> examples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5714) ContainerExecutor does not order environment map

2017-01-06 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806273#comment-15806273
 ] 

Miklos Szegedi commented on YARN-5714:
--

Thank you for the comment [~templedf].
That option raises the question to prepend or append to the client list in 
{{sanitizeEnv()}} and in the code that populates it for the context and whether 
the assumption is true all the time. The serialization code in 
{{addEnvToProto()}} needs to be changed to iterate a LinkedHashMap order and 
the deserialization code in {{initEnv()}} to create a LinkedHashMap. Moreover, 
since it relies on the client and the application to give the right order all 
the time, so it probably involves changes outside Yarn.
Do you also suggest to change the container launch context interface to accept 
a linked hash map or iterator instead of a map? Is the plan to keep the 
interface but handle it correctly, if it is a linked hash map?
A positive effect of the LinkedHashMap is that the algorithm does not need to 
run at every container launch.


> ContainerExecutor does not order environment map
> 
>
> Key: YARN-5714
> URL: https://issues.apache.org/jira/browse/YARN-5714
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.1, 2.5.2, 2.7.3, 2.6.4, 3.0.0-alpha1
> Environment: all (linux and windows alike)
>Reporter: Remi Catherinot
>Assignee: Remi Catherinot
>Priority: Trivial
>  Labels: oct16-medium
> Attachments: YARN-5714.001.patch, YARN-5714.002.patch, 
> YARN-5714.003.patch, YARN-5714.004.patch, YARN-5714.005.patch, 
> YARN-5714.006.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> when dumping the launch container script, environment variables are dumped 
> based on the order internally used by the map implementation (hash based). It 
> does not take into consideration that some env varibales may refer each 
> other, and so that some env variables must be declared before those 
> referencing them.
> In my case, i ended up having LD_LIBRARY_PATH which was depending on 
> HADOOP_COMMON_HOME being dumped before HADOOP_COMMON_HOME. Thus it had a 
> wrong value and so native libraries weren't loaded. jobs were running but not 
> at their best efficiency. This is just a use case falling into that bug, but 
> i'm sure others may happen as well.
> I already have a patch running in my production environment, i just estimate 
> to 5 days for packaging the patch in the right fashion for JIRA + try my best 
> to add tests.
> Note : the patch is not OS aware with a default empty implementation. I will 
> only implement the unix version on a 1st release. I'm not used to windows env 
> variables syntax so it will take me more time/research for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-5585) [Atsv2] Reader side changes for entity prefix and support for pagination via additional filters

2017-01-06 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-5585:
--
Comment: was deleted

(was: [~varun_saxena], can you commit this patch, or shall I? Do let me know. 
Thanks!)

> [Atsv2] Reader side changes for entity prefix and support for pagination via 
> additional filters
> ---
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
>  Labels: yarn-5355-merge-blocker
> Attachments: 0001-YARN-5585.patch, YARN-5585-YARN-5355.0001.patch, 
> YARN-5585-YARN-5355.0002.patch, YARN-5585-YARN-5355.0003.patch, 
> YARN-5585-YARN-5355.0004.patch, YARN-5585-YARN-5355.0005.patch, 
> YARN-5585-YARN-5355.0006.patch, YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5585) [Atsv2] Reader side changes for entity prefix and support for pagination via additional filters

2017-01-06 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806258#comment-15806258
 ] 

Sangjin Lee commented on YARN-5585:
---

[~varun_saxena], can you commit this patch, or shall I? Do let me know. Thanks!

> [Atsv2] Reader side changes for entity prefix and support for pagination via 
> additional filters
> ---
>
> Key: YARN-5585
> URL: https://issues.apache.org/jira/browse/YARN-5585
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
>  Labels: yarn-5355-merge-blocker
> Attachments: 0001-YARN-5585.patch, YARN-5585-YARN-5355.0001.patch, 
> YARN-5585-YARN-5355.0002.patch, YARN-5585-YARN-5355.0003.patch, 
> YARN-5585-YARN-5355.0004.patch, YARN-5585-YARN-5355.0005.patch, 
> YARN-5585-YARN-5355.0006.patch, YARN-5585-workaround.patch, YARN-5585.v0.patch
>
>
> TimelineReader REST API's provides lot of filters to retrieve the 
> applications. Along with those, it would be good to add new filter i.e fromId 
> so that entities can be retrieved after the fromId. 
> Current Behavior : Default limit is set to 100. If there are 1000 entities 
> then REST call gives first/last 100 entities. How to retrieve next set of 100 
> entities i.e 101 to 200 OR 900 to 801?
> Example : If applications are stored database, app-1 app-2 ... app-10.
> *getApps?limit=5* gives app-1 to app-5. But to retrieve next 5 apps, there is 
> no way to achieve this. 
> So proposal is to have fromId in the filter like 
> *getApps?limit=5&=app-5* which gives list of apps from app-6 to 
> app-10. 
> Since ATS is targeting large number of entities storage, it is very common 
> use case to get next set of entities using fromId rather than querying all 
> the entites. This is very useful for pagination in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5991) Yarn Distributed Shell does not print throwable t to App Master When failed to start container

2017-01-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806257#comment-15806257
 ] 

Hudson commented on YARN-5991:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11082 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11082/])
YARN-5991. Yarn Distributed Shell does not print throwable t to App (templedf: 
rev 71a4acf74bc9ca34f0e57835c9d6e3efbe7c0567)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java


> Yarn Distributed Shell does not print throwable t to App Master When failed 
> to start container
> --
>
> Key: YARN-5991
> URL: https://issues.apache.org/jira/browse/YARN-5991
> Project: Hadoop YARN
>  Issue Type: Improvement
> Environment: apache hadoop 2.7.1, centos 6.5
>Reporter: dashwang
>Assignee: Jim Frankola
>Priority: Minor
>  Labels: newbie
> Fix For: 3.0.0-alpha2
>
> Attachments: YARN-5991.001.patch
>
>
> 16/12/12 16:27:20 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> START_CONTAINER for Container container_1481517162158_0027_01_03
> 16/12/12 16:27:20 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> START_CONTAINER for Container container_1481517162158_0027_01_04
> 16/12/12 16:27:20 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
> START_CONTAINER for Container container_1481517162158_0027_01_02
> 16/12/12 16:27:20 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> slave02:22710
> 16/12/12 16:27:20 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> slave01:34140
> 16/12/12 16:27:20 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
> master:52037
> 16/12/12 16:27:20 ERROR launcher.ApplicationMaster: Failed to start Container 
> container_1481517162158_0027_01_02



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5258) Document Use of Docker with LinuxContainerExecutor

2017-01-06 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806246#comment-15806246
 ] 

Sidharta Seethana commented on YARN-5258:
-

[~templedf], the doc looks good. The is one thing, though : 

{quote}
the entry point will be ignored when LCE launches the image because the
LCE always specifies the command to execute as YARN's container launch script.
{quote}

Actually, images with ‘entrypoint’s won’t work correctly - the entry point is 
not ignored, the launch command (in this case launch_container.sh) will be sent 
as an argument to the entrypoint which is completely unexpected. The only 
scenario where this could work is if 
YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE is in use - where the entry 
point is used with the default command. We’ll need to find a fix/work around 
for this for other scenarios. 

> Document Use of Docker with LinuxContainerExecutor
> --
>
> Key: YARN-5258
> URL: https://issues.apache.org/jira/browse/YARN-5258
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>  Labels: oct16-easy
> Attachments: YARN-5258.001.patch, YARN-5258.002.patch, 
> YARN-5258.003.patch, YARN-5258.004.patch
>
>
> There aren't currently any docs that explain how to configure Docker and all 
> of its various options aside from reading all of the JIRAs.  We need to 
> document the configuration, use, and troubleshooting, along with helpful 
> examples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec

2017-01-06 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806155#comment-15806155
 ] 

Allen Wittenauer commented on YARN-6060:


{code}
+#ifdef __linux
{code}

This looks like a vendor-ism creeping in.  Various contributors do test and use 
more than just Linux. (and yes, lce works just fine on them.)

I'm assuming that people are setting noexec from some false sense of security.  
It's pure theatrics to say that noexec provides any sort of protection to a 
system like Hadoop.  Lots of ways around this, never mind that Java itself is 
perfectly capable (albeit usually in crappy ways) to do just as much harm as 
anything else.  

At this point, I don't think this patch should go in simply because it sends 
the wrong message, isn't particularly useful, and opens up a huge hole on 
misconfigured systems.

> Linux container executor fails to run container on directories mounted as 
> noexec
> 
>
> Key: YARN-6060
> URL: https://issues.apache.org/jira/browse/YARN-6060
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-6060.000.patch, YARN-6060.001.patch
>
>
> If node manager directories are mounted as noexec, LCE fails with the 
> following error:
> Launching container...
> Couldn't execute the container launch file 
> /tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh
>  - Permission denied



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5864) YARN Capacity Scheduler - Queue Priorities

2017-01-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-5864:
-
Attachment: YARN-5864.002.patch

Attached ver.2 patch, which handled unit test failures, javadocs/findbugs 
warnings.

The biggest change is it added logics to move reservation around (for example, 
it is not possible to preempt containers to allocate a reserved container.

See {{TestCapacitySchedulerSurgicalPreemption}}#
{{testPriorityPreemptionRequiresMoveReservation}} as an example.

> YARN Capacity Scheduler - Queue Priorities
> --
>
> Key: YARN-5864
> URL: https://issues.apache.org/jira/browse/YARN-5864
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-5864.001.patch, YARN-5864.002.patch, 
> YARN-5864.poc-0.patch, YARN-CapacityScheduler-Queue-Priorities-design-v1.pdf
>
>
> Currently, Capacity Scheduler at every parent-queue level uses relative 
> used-capacities of the chil-queues to decide which queue can get next 
> available resource first.
> For example,
> - Q1 & Q2 are child queues under queueA
> - Q1 has 20% of configured capacity, 5% of used-capacity and
> - Q2 has 80% of configured capacity, 8% of used-capacity.
> In the situation, the relative used-capacities are calculated as below
> - Relative used-capacity of Q1 is 5/20 = 0.25
> - Relative used-capacity of Q2 is 8/80 = 0.10
> In the above example, per today’s Capacity Scheduler’s algorithm, Q2 is 
> selected by the scheduler first to receive next available resource.
> Simply ordering queues according to relative used-capacities sometimes causes 
> a few troubles because scarce resources could be assigned to less-important 
> apps first.
> # Latency sensitivity: This can be a problem with latency sensitive 
> applications where waiting till the ‘other’ queue gets full is not going to 
> cut it. The delay in scheduling directly reflects in the response times of 
> these applications.
> # Resource fragmentation for large-container apps: Today’s algorithm also 
> causes issues with applications that need very large containers. It is 
> possible that existing queues are all within their resource guarantees but 
> their current allocation distribution on each node may be such that an 
> application which needs large container simply cannot fit on those nodes.
> Services:
> # The above problem (2) gets worse with long running applications. With short 
> running apps, previous containers may eventually finish and make enough space 
> for the apps with large containers. But with long running services in the 
> cluster, the large containers’ application may never get resources on any 
> nodes even if its demands are not yet met.
> # Long running services are sometimes more picky w.r.t placement than normal 
> batch apps. For example, for a long running service in a separate queue (say 
> queue=service), during peak hours it may want to launch instances on 50% of 
> the cluster nodes. On each node, it may want to launch a large container, say 
> 200G memory per container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6067) Applications API Service HA

2017-01-06 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-6067:

Summary: Applications API Service HA  (was: API Service HA)

> Applications API Service HA
> ---
>
> Key: YARN-6067
> URL: https://issues.apache.org/jira/browse/YARN-6067
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>
> We need to start thinking about HA for API Service. How do we achieve it? 
> Should API Service become part of the RM process to get a lot of things for 
> free? Should there be some other strategy. We need to start the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6067) Applications API Service HA

2017-01-06 Thread Gour Saha (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gour Saha updated YARN-6067:

Description: We need to start thinking about HA for the Applications API 
Service. How do we achieve it? Should API Service become part of the RM process 
to get a lot of things for free? Should there be some other strategy. We need 
to start the discussion.  (was: We need to start thinking about HA for API 
Service. How do we achieve it? Should API Service become part of the RM process 
to get a lot of things for free? Should there be some other strategy. We need 
to start the discussion.)

> Applications API Service HA
> ---
>
> Key: YARN-6067
> URL: https://issues.apache.org/jira/browse/YARN-6067
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>
> We need to start thinking about HA for the Applications API Service. How do 
> we achieve it? Should API Service become part of the RM process to get a lot 
> of things for free? Should there be some other strategy. We need to start the 
> discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   3   >