[jira] [Commented] (YARN-9548) [Umbrella] Make YARN work well in elastic cloud environments

2019-08-13 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906068#comment-16906068
 ] 

Junping Du commented on YARN-9548:
--

+1. 
I am quite interesting on autoscaling part. For horizontal scaling, we can 
leverage graceful decommission (YARN-914) to decommission/recommission nodes 
based on metrics monitoring. For vertical scaling, we can leverage dynamic 
resource allocation (YARN-291) to have a min/max resource setting on each node 
and update according to resource profiling of each node.

> [Umbrella] Make YARN work well in elastic cloud environments
> 
>
> Key: YARN-9548
> URL: https://issues.apache.org/jira/browse/YARN-9548
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Vinod Kumar Vavilapalli
>Priority: Major
>
> YARN works well in static environments but there isn't fundamentally broken 
> in YARN to stop us from making it work well in dynamic environments like 
> cloud (public or private) as well.
> There are few areas where we need to invest though
>  # Autoscaling
>  -- cluster level: add/remove nodes intelligently based on metrics and/or 
> admin plugins
>  -- node level: scale nodes up/down vertically?
>  # Smarter scheduling
> -- to pack containers as opposed to spreading them around to account for 
> nodes going away
> -- to account for speculative nodes like spot instances
>  # Handling nodes going away better
> -- by decommissioning sanely
> -- dealing with auxiliary services data
>  # And any installation helpers in this dynamic world - scripts, operators 
> etc.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

2019-02-12 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766686#comment-16766686
 ] 

Junping Du commented on YARN-999:
-

bq. I am not sure how exactly the reduction of node resources is implemented, 
but for the opportunistic containers, you can kill stuff locally at the NMs. So 
if you need to free up resources due to resource reduction, you can go over the 
opportunistic containers running and kill the long-running ones.
So far, the reduction of node resources won't kill any containers but wait 
until container get finished - quite old behavior as no long running service 
support when feature get implemented for the first time. 

I think we need a generic policy here that can pick up containers to balloon 
out resources according to some cost - opportunistic/guaranteed could be one 
dimension but could count others - container size, running time, etc.

> In case of long running tasks, reduce node resource should balloon out 
> resource quickly by calling preemption API and suspending running task. 
> ---
>
> Key: YARN-999
> URL: https://issues.apache.org/jira/browse/YARN-999
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Priority: Major
> Attachments: YARN-291.000.patch
>
>
> In current design and implementation, when we decrease resource on node to 
> less than resource consumption of current running tasks, tasks can still be 
> running until the end. But just no new task get assigned on this node 
> (because AvailableResource < 0) until some tasks are finished and 
> AvailableResource > 0 again. This is good for most cases but in case of long 
> running task, it could be too slow for resource setting to actually work so 
> preemption could be hired here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-1508) Document Dynamic Resource Configuration feature

2019-02-11 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-1508:


Assignee: (was: Junping Du)

> Document Dynamic Resource Configuration feature
> ---
>
> Key: YARN-1508
> URL: https://issues.apache.org/jira/browse/YARN-1508
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Priority: Major
>
> Per Vinod's comment in 
> YARN-312(https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087)
>  and Bikas' comment in 
> YARN-311(https://issues.apache.org/jira/browse/YARN-311?focusedCommentId=13848615),
>  the name of ResourceOption is not good enough for being understood. Also, we 
> need to document more on resource overcommitment time and use cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1508) Document Dynamic Resource Configuration feature

2019-02-11 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765631#comment-16765631
 ] 

Junping Du commented on YARN-1508:
--

Sure. I make it unassigned. Thanks for working on this, [~elgoiri]!

> Document Dynamic Resource Configuration feature
> ---
>
> Key: YARN-1508
> URL: https://issues.apache.org/jira/browse/YARN-1508
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Priority: Major
>
> Per Vinod's comment in 
> YARN-312(https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087)
>  and Bikas' comment in 
> YARN-311(https://issues.apache.org/jira/browse/YARN-311?focusedCommentId=13848615),
>  the name of ResourceOption is not good enough for being understood. Also, we 
> need to document more on resource overcommitment time and use cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6055) ContainersMonitorImpl need be adjusted when NM resource changed.

2019-02-11 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-6055:


Assignee: (was: Junping Du)

> ContainersMonitorImpl need be adjusted when NM resource changed.
> 
>
> Key: YARN-6055
> URL: https://issues.apache.org/jira/browse/YARN-6055
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Priority: Major
>
> Per Ravi's comments in YARN-4832, we need to check some limits in 
> containerMonitorImpl to make sure it get updated also when Resource updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-2489) ResouceOption's overcommitTimeout should be respected during resource update on NM

2019-02-11 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-2489:


Assignee: (was: Junping Du)

> ResouceOption's overcommitTimeout should be respected during resource update 
> on NM
> --
>
> Key: YARN-2489
> URL: https://issues.apache.org/jira/browse/YARN-2489
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Priority: Major
>
> The ResourceOption to update NM's resource has two properties: Resource and 
> OvercommitTimeout. The later property is used to guarantee resource is 
> withdrawn after timeout is hit if resource is reduced to a value and current 
> resource consumption exceeds the new value. It currently use default value -1 
> which means no timeout, and we should make this property work when updating 
> NM resource.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

2019-02-11 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765630#comment-16765630
 ] 

Junping Du commented on YARN-999:
-

bq. I tried to go through the code to see where the overcommit timeout was used 
but I didn't get anywhere useful. Does anybody know if this is actually 
implemented?
No. YARN-2489 is supposed to work on it but haven't done yet.

bq. As this has been 6 years, I'd take over this if nobody is on it.
My bad. My priority keep changing... Please feel free to take it. I will help 
on review.

> In case of long running tasks, reduce node resource should balloon out 
> resource quickly by calling preemption API and suspending running task. 
> ---
>
> Key: YARN-999
> URL: https://issues.apache.org/jira/browse/YARN-999
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Major
>
> In current design and implementation, when we decrease resource on node to 
> less than resource consumption of current running tasks, tasks can still be 
> running until the end. But just no new task get assigned on this node 
> (because AvailableResource < 0) until some tasks are finished and 
> AvailableResource > 0 again. This is good for most cases but in case of long 
> running task, it could be too slow for resource setting to actually work so 
> preemption could be hired here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

2019-02-11 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-999:
---

Assignee: (was: Junping Du)

> In case of long running tasks, reduce node resource should balloon out 
> resource quickly by calling preemption API and suspending running task. 
> ---
>
> Key: YARN-999
> URL: https://issues.apache.org/jira/browse/YARN-999
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Priority: Major
>
> In current design and implementation, when we decrease resource on node to 
> less than resource consumption of current running tasks, tasks can still be 
> running until the end. But just no new task get assigned on this node 
> (because AvailableResource < 0) until some tasks are finished and 
> AvailableResource > 0 again. This is good for most cases but in case of long 
> running task, it could be too slow for resource setting to actually work so 
> preemption could be hired here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-1000) Dynamic resource configuration feature can be configured to enable or disable and persistent on setting or not

2019-02-11 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-1000:


Assignee: (was: Junping Du)

> Dynamic resource configuration feature can be configured to enable or disable 
> and persistent on setting or not
> --
>
> Key: YARN-1000
> URL: https://issues.apache.org/jira/browse/YARN-1000
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Priority: Major
> Attachments: YARN-1000-sample.patch
>
>
> There are some configurations for feature of dynamic resource configuration:
> 1. enable or not: if enable, then setting node resource in runtime through 
> CLI/REST/JMX can be successful, else exceptions of "function not supported" 
> will be thrown out. In future, we may support to enable this feature in 
> partial nodes which has resource flexibility (like virtual nodes).
> 2. dynamic resource setting is persistent or not: it depends on users' 
> scenario to see if the life cycle of setting in runtime should be kept after 
> NM is down and restart.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1508) Document Dynamic Resource Configuration feature

2019-02-11 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765617#comment-16765617
 ] 

Junping Du commented on YARN-1508:
--

Not yet I think. [~elgoiri], please feel free to take it if you would like to 
work on this and I will help to review.

> Document Dynamic Resource Configuration feature
> ---
>
> Key: YARN-1508
> URL: https://issues.apache.org/jira/browse/YARN-1508
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Major
>
> Per Vinod's comment in 
> YARN-312(https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087)
>  and Bikas' comment in 
> YARN-311(https://issues.apache.org/jira/browse/YARN-311?focusedCommentId=13848615),
>  the name of ResourceOption is not good enough for being understood. Also, we 
> need to document more on resource overcommitment time and use cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2018-09-08 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6091:
-
Target Version/s:   (was: 2.8.5)

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.1
> Environment: CentOS
>Reporter: zhengchenyu
>Assignee: Eric Badger
>Priority: Critical
>  Labels: Docker
> Attachments: YARN-6091.001.patch, YARN-6091.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2018-07-17 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546329#comment-16546329
 ] 

Junping Du commented on YARN-5464:
--

sure. unassign it. please feel free to take it.

> Server-Side NM Graceful Decommissioning with RM HA
> --
>
> Key: YARN-5464
> URL: https://issues.apache.org/jira/browse/YARN-5464
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Robert Kanter
>Assignee: Junping Du
>Priority: Major
> Attachments: YARN-5464.wip.patch
>
>
> Make sure to remove the note added by YARN-7094 about RM HA failover not 
> working right.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2018-07-17 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-5464:


Assignee: (was: Junping Du)

> Server-Side NM Graceful Decommissioning with RM HA
> --
>
> Key: YARN-5464
> URL: https://issues.apache.org/jira/browse/YARN-5464
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Robert Kanter
>Priority: Major
> Attachments: YARN-5464.wip.patch
>
>
> Make sure to remove the note added by YARN-7094 about RM HA failover not 
> working right.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6265) yarn.resourcemanager.fail-fast is used inconsistently

2018-07-09 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536673#comment-16536673
 ] 

Junping Du commented on YARN-6265:
--

Oops... just found my commit message doesn't include jira number... note here 
for coming release manager to refer.

> yarn.resourcemanager.fail-fast is used inconsistently
> -
>
> Key: YARN-6265
> URL: https://issues.apache.org/jira/browse/YARN-6265
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Yuanbo Liu
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: YARN-6265.001.patch, YARN-6265.002.patch, 
> YARN-6265.003.patch
>
>
> In capacity scheduler, the property is used to control whether an app with 
> no/bad queue should be killed.  In the state store, the property controls 
> whether a state store op failure should cause the RM to exit in non-HA mode.  
> Those are two very different things, and they should be separated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6265) yarn.resourcemanager.fail-fast is used inconsistently

2018-07-01 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529365#comment-16529365
 ] 

Junping Du commented on YARN-6265:
--

+1. 003 patch LGTM. Will commit it tomorrow if no further comments from others.

> yarn.resourcemanager.fail-fast is used inconsistently
> -
>
> Key: YARN-6265
> URL: https://issues.apache.org/jira/browse/YARN-6265
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Yuanbo Liu
>Priority: Major
> Attachments: YARN-6265.001.patch, YARN-6265.002.patch, 
> YARN-6265.003.patch
>
>
> In capacity scheduler, the property is used to control whether an app with 
> no/bad queue should be killed.  In the state store, the property controls 
> whether a state store op failure should cause the RM to exit in non-HA mode.  
> Those are two very different things, and they should be separated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6265) yarn.resourcemanager.fail-fast is used inconsistently

2018-06-29 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527270#comment-16527270
 ] 

Junping Du commented on YARN-6265:
--

Looks like patch doesn't apply any more.
[~yuanbo] , do you have bandwidth to update it?

> yarn.resourcemanager.fail-fast is used inconsistently
> -
>
> Key: YARN-6265
> URL: https://issues.apache.org/jira/browse/YARN-6265
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Yuanbo Liu
>Priority: Major
> Attachments: YARN-6265.001.patch, YARN-6265.002.patch
>
>
> In capacity scheduler, the property is used to control whether an app with 
> no/bad queue should be killed.  In the state store, the property controls 
> whether a state store op failure should cause the RM to exit in non-HA mode.  
> Those are two very different things, and they should be separated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6265) yarn.resourcemanager.fail-fast is used inconsistently

2018-06-26 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16523583#comment-16523583
 ] 

Junping Du commented on YARN-6265:
--

sorry for late coming for a while on this. v2 patch LGTM. Let me kick off the 
jenkins.

> yarn.resourcemanager.fail-fast is used inconsistently
> -
>
> Key: YARN-6265
> URL: https://issues.apache.org/jira/browse/YARN-6265
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Yuanbo Liu
>Priority: Major
> Attachments: YARN-6265.001.patch, YARN-6265.002.patch
>
>
> In capacity scheduler, the property is used to control whether an app with 
> no/bad queue should be killed.  In the state store, the property controls 
> whether a state store op failure should cause the RM to exit in non-HA mode.  
> Those are two very different things, and they should be separated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6482) TestSLSRunner runs but doesn't executed jobs (.json parsing issue)

2018-06-26 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16523264#comment-16523264
 ] 

Junping Du commented on YARN-6482:
--

Hi [~yuanbo], any plan to deliver a fix?

> TestSLSRunner runs but doesn't executed jobs (.json parsing issue)
> --
>
> Key: YARN-6482
> URL: https://issues.apache.org/jira/browse/YARN-6482
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Carlo Curino
>Assignee: Yuanbo Liu
>Priority: Minor
>
> The TestSLSRunner runs correctly brining up and RM, but the parsing of the 
> rumen trace fails somehow silently, and no nodes nor jobs are loaded. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2018-05-07 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466774#comment-16466774
 ] 

Junping Du commented on YARN-6091:
--

Move to 2.8.5 as 2.8.4 is in RC stage.

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.1
> Environment: CentOS
>Reporter: zhengchenyu
>Assignee: Eric Badger
>Priority: Critical
> Attachments: YARN-6091.001.patch, YARN-6091.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2018-05-07 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6091:
-
Target Version/s: 2.8.5  (was: 2.8.4)

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.1
> Environment: CentOS
>Reporter: zhengchenyu
>Assignee: Eric Badger
>Priority: Critical
> Attachments: YARN-6091.001.patch, YARN-6091.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7598) Document how to use classpath isolation for aux-services in YARN

2018-04-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449445#comment-16449445
 ] 

Junping Du commented on YARN-7598:
--

+1. Latest patch LGTM. Committing.

> Document how to use classpath isolation for aux-services in YARN
> 
>
> Key: YARN-7598
> URL: https://issues.apache.org/jira/browse/YARN-7598
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7598.2.patch, YARN-7598.3.patch, YARN-7598.4.patch, 
> YARN-7598.5.patch, YARN-7598.trunk.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7598) Document how to use classpath isolation for aux-services in YARN

2018-04-13 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-7598:
-
Target Version/s: 3.2.0, 3.1.1, 2.9.2, 3.0.3  (was: 3.2.0)

> Document how to use classpath isolation for aux-services in YARN
> 
>
> Key: YARN-7598
> URL: https://issues.apache.org/jira/browse/YARN-7598
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7598.2.patch, YARN-7598.3.patch, YARN-7598.4.patch, 
> YARN-7598.trunk.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7598) Document how to use classpath isolation for aux-services in YARN

2018-04-09 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430229#comment-16430229
 ] 

Junping Du commented on YARN-7598:
--

Thanks [~xgong] for the updating the patch! 
The patch looks generally good to me, but have several comments:
bq. so users/admins don't have do anything for it to support NM restart.
"don't have do" => "don't have to do" or "don't need to do"

{noformat}
+   
+   
yarn.nodemanager.aux-services.CustomAuxService.classpath
+   ${local_dir_to_jar}/*
+   
+
+
{noformat}
the example above looks a bit misleading, it should point to single jar or 
multiple jars no matter local or remote classpath. Isn't it? If so, better to 
point to the same jar or jars. Also, what will happen if specify local and 
remote classpath at the same time? Better to document here.

Other looks good to me.

> Document how to use classpath isolation for aux-services in YARN
> 
>
> Key: YARN-7598
> URL: https://issues.apache.org/jira/browse/YARN-7598
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7598.2.patch, YARN-7598.3.patch, 
> YARN-7598.trunk.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8118) Better utilize gracefully decommissioning node managers

2018-04-07 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429629#comment-16429629
 ] 

Junping Du commented on YARN-8118:
--

Thanks for contributing your idea and code, [~Karthik Palaniappan]! 
As Jason mentioned above, our main goal here is to remove decommissioning nodes 
from service ASAP with least price of interrupting existing progress that 
applications already made (existing containers running). in my opinion, in most 
cases, there is no significant difference between containers to be scheduled by 
existing applications or new applications. If there are any, the right solution 
should be via priority/preemption mechanism between applications. In another 
word, we don't have assumption on priority differences between existing and new 
applications in our typical decommissioning cases.
However, in a pure cloud environment (like EMR, etc.), the scenario could be 
different - what I can imagine (please correct me if I am wrong) is: user(also 
an admin in yarn prospective) drop most workloads to a dedicated yarn cluster 
and wish the cluster can shrink to some minimal size later when applications 
get finished. If this is the case that current design and code want to target, 
then we should take Jason's suggestion above to have a new configure for 
cluster or a new parameter for graceful decommission CLI. 
We need to be careful here as previous decommissioning nodes operation is 
idempotent, here we need to figure out what means if new applications get 
submitted between multiple operations and how to track them - I don't think the 
current code provide a way.

> Better utilize gracefully decommissioning node managers
> ---
>
> Key: YARN-8118
> URL: https://issues.apache.org/jira/browse/YARN-8118
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 2.8.2
> Environment: * Google Compute Engine (Dataproc)
>  * Java 8
>  * Hadoop 2.8.2 using client-mode graceful decommissioning
>Reporter: Karthik Palaniappan
>Priority: Major
> Attachments: YARN-8118-branch-2.001.patch
>
>
> Proposal design doc with background + details (please comment directly on 
> doc): 
> [https://docs.google.com/document/d/1hF2Bod_m7rPgSXlunbWGn1cYi3-L61KvQhPlY9Jk9Hk/edit#heading=h.ab4ufqsj47b7]
> tl;dr Right now, DECOMMISSIONING nodes must wait for in-progress applications 
> to complete before shutting down, but they cannot run new containers from 
> those in-progress applications. This is wasteful, particularly in 
> environments where you are billed by resource usage (e.g. EC2).
> Proposal: YARN should schedule containers from in-progress applications on 
> DECOMMISSIONING nodes, but should still avoid scheduling containers from new 
> applications. That will make in-progress applications complete faster and let 
> nodes decommission faster. Overall, this should be cheaper.
> I have a working patch without unit tests that's surprisingly just a few real 
> lines of code (patch 001). If folks are happy with the proposal, I'll write 
> unit tests and also write a patch targeted at trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2018-04-06 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428154#comment-16428154
 ] 

Junping Du commented on YARN-6091:
--

move to 2.8.4 as 2.8.3 is released last year.

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.1
> Environment: CentOS
>Reporter: zhengchenyu
>Assignee: Eric Badger
>Priority: Critical
> Attachments: YARN-6091.001.patch, YARN-6091.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6091) the AppMaster register failed when use Docker on LinuxContainer

2018-04-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6091:
-
Target Version/s: 2.8.4  (was: 2.8.3)

> the AppMaster register failed when use Docker on LinuxContainer 
> 
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, yarn
>Affects Versions: 2.8.1
> Environment: CentOS
>Reporter: zhengchenyu
>Assignee: Eric Badger
>Priority: Critical
> Attachments: YARN-6091.001.patch, YARN-6091.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that 
> AppMaster register to Resourcemanager failed. But didn't happen in other 
> servers. 
> I found the pclose (in container-executor.c) return different value in 
> different server, even though the process which is launched by popen is 
> running normally. Some server return 0, and others return 13. 
> Because yarn regard the application as failed application when pclose return 
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register 
> failed because Resourcemanager have removed this applicaiton's token. 
> In container-executor.c, the judgement condition is whether the return code 
> is zero. But man the pclose, the document tells that "pclose return -1" 
> represent wrong. So I change the judgement condition, then slove this 
> problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7598) Document how to use classpath isolation for aux-services in YARN

2018-01-29 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343874#comment-16343874
 ] 

Junping Du commented on YARN-7598:
--

Thanks [~vinodkv] for comments. I agree that's a reasonable suggestion. 
[~xgong], would you incorporate Vinod's comments above? Thanks!

> Document how to use classpath isolation for aux-services in YARN
> 
>
> Key: YARN-7598
> URL: https://issues.apache.org/jira/browse/YARN-7598
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-7598.trunk.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7468) Provide means for container network policy control

2018-01-09 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-7468:
-
Priority: Major  (was: Minor)

> Provide means for container network policy control
> --
>
> Key: YARN-7468
> URL: https://issues.apache.org/jira/browse/YARN-7468
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Reporter: Clay B.
>Assignee: Xuan Gong
> Attachments: YARN-7468.trunk.1.patch, YARN-7468.trunk.1.patch, 
> YARN-7468.trunk.2.patch, YARN-7468.trunk.2.patch, YARN-7468.trunk.3.patch, 
> [YARN-7468] [Design] Provide means for container network policy control.pdf
>
>
> To prevent data exfiltration from a YARN cluster, it would be very helpful to 
> have "firewall" rules able to map to a user/queue's containers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7598) Document how to use classpath isolation for aux-services in YARN

2018-01-05 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314118#comment-16314118
 ] 

Junping Du commented on YARN-7598:
--

+1. Patch LGTM. Will commit it tomorrow if no further review/comments.

> Document how to use classpath isolation for aux-services in YARN
> 
>
> Key: YARN-7598
> URL: https://issues.apache.org/jira/browse/YARN-7598
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-7598.trunk.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7673) ClassNotFoundException: org.apache.hadoop.yarn.server.api.DistributedSchedulingAMProtocol when using hadoop-client-minicluster

2017-12-20 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16299201#comment-16299201
 ] 

Junping Du commented on YARN-7673:
--

Thanks [~zjffdu] for trying hadoop shaded jar in downstream project and 
reporting the issue. Discussed this with [~bharatviswa], we think we could miss 
some classes in wrapping up these shaded jars. If no objections, I will go 
ahead to create a umbrella JIRA to track unfinished work for hadoop shaded 
client in case more classes are found as missing in real world testing.

> ClassNotFoundException: 
> org.apache.hadoop.yarn.server.api.DistributedSchedulingAMProtocol when using 
> hadoop-client-minicluster
> --
>
> Key: YARN-7673
> URL: https://issues.apache.org/jira/browse/YARN-7673
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jeff Zhang
>
> I'd like to use hadoop-client-minicluster for hadoop downstream project, but 
> I encounter the following exception when starting hadoop minicluster.  And I 
> check the hadoop-client-minicluster, it indeed does not have this class. Is 
> this something that is missing when packaging the published jar ?
> {code}
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/yarn/server/api/DistributedSchedulingAMProtocol
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.createResourceManager(MiniYARNCluster.java:851)
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.serviceInit(MiniYARNCluster.java:285)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7466) ResourceRequest has a different default for allocationRequestId than Container

2017-12-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297652#comment-16297652
 ] 

Junping Du commented on YARN-7466:
--

Hi [~jianhe] and [~eyang], this problem seems to be generic to branch-3.0 as 
well. Shall we commit them to branch-3.0 and get released in 3.0.1?

> ResourceRequest has a different default for allocationRequestId than Container
> --
>
> Key: YARN-7466
> URL: https://issues.apache.org/jira/browse/YARN-7466
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
> Fix For: 3.1.0
>
> Attachments: YARN-7466.001.patch, YARN-7466.addendum.001.patch
>
>
> The default value of allocationRequestId is inconsistent.
> It is  -1 in {{ContainerProto}} but 0 in {{ResourceRequestProto}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7673) ClassNotFoundException: org.apache.hadoop.yarn.server.api.DistributedSchedulingAMProtocol when using hadoop-client-minicluster

2017-12-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16297637#comment-16297637
 ] 

Junping Du commented on YARN-7673:
--

I think we could miss adding yarn-server-api into dependency of 
hadoop-client-minicluster. [~busbey] and [~bharatviswa], I think we should add 
it. Thoughts?

> ClassNotFoundException: 
> org.apache.hadoop.yarn.server.api.DistributedSchedulingAMProtocol when using 
> hadoop-client-minicluster
> --
>
> Key: YARN-7673
> URL: https://issues.apache.org/jira/browse/YARN-7673
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jeff Zhang
>
> I'd like to use hadoop-client-minicluster for hadoop downstream project, but 
> I encounter the following exception when starting hadoop minicluster.  And I 
> check the hadoop-client-minicluster, it indeed does not have this class. Is 
> this something that is missing when packaging the published jar ?
> {code}
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/yarn/server/api/DistributedSchedulingAMProtocol
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.createResourceManager(MiniYARNCluster.java:851)
>   at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.serviceInit(MiniYARNCluster.java:285)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7646) MR job (based on old version tarball) get failed due to incompatible resource request

2017-12-13 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1628#comment-1628
 ] 

Junping Du edited comment on YARN-7646 at 12/13/17 9:58 PM:


[~sunilg], my test here is using MR over distributed cache which is 
prerequisite for supporting rolling upgrade just as my above configuration 
shows. I think you were testing a different scenario?


was (Author: djp):
[~sunil.gov...@gmail.com], my test here is using MR over distributed cache 
which is prerequisite for supporting rolling upgrade just as my above 
configuration shows. I think you were testing a different scenario.

> MR job (based on old version tarball) get failed due to incompatible resource 
> request
> -
>
> Key: YARN-7646
> URL: https://issues.apache.org/jira/browse/YARN-7646
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Junping Du
>Priority: Blocker
>
> With quick workaround with fixing HDFS-12920 (set non time unit to 
> hdfs-site.xml), the job still get failed with following error:
> {noformat}
> 2017-12-12 16:39:13,105 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=-1, maxMemory=8192
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:240)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:256)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:246)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:217)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:388)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:483)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> 

[jira] [Commented] (YARN-7646) MR job (based on old version tarball) get failed due to incompatible resource request

2017-12-13 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1628#comment-1628
 ] 

Junping Du commented on YARN-7646:
--

[~sunil.gov...@gmail.com], my test here is using MR over distributed cache 
which is prerequisite for supporting rolling upgrade just as my above 
configuration shows. I think you were testing a different scenario.

> MR job (based on old version tarball) get failed due to incompatible resource 
> request
> -
>
> Key: YARN-7646
> URL: https://issues.apache.org/jira/browse/YARN-7646
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Junping Du
>Priority: Blocker
>
> With quick workaround with fixing HDFS-12920 (set non time unit to 
> hdfs-site.xml), the job still get failed with following error:
> {noformat}
> 2017-12-12 16:39:13,105 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=-1, maxMemory=8192
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:240)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:256)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:246)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:217)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:388)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:483)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy81.allocate(Unknown Source)
>   at 
> 

[jira] [Commented] (YARN-7646) MR job (based on old version tarball) get failed due to incompatible resource request

2017-12-13 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289971#comment-16289971
 ] 

Junping Du commented on YARN-7646:
--

No different. Still hit the same issue. Below is my configuration of 
mapred-site.xml:
{noformat}


  
mapreduce.job.user.name
${user.name}
  

  
mapreduce.framework.name
yarn
  

  
mapreduce.application.framework.path
hdfs:/mapreduce/2.8/hadoop-2.8.3.tar.gz#mr-framework
  

  
mapreduce.application.classpath

$PWD/mr-framework/hadoop-2.8.3/share/hadoop/mapreduce/*,$PWD/mr-framework/hadoop-2.8.3/share/hadoop/mapreduce/lib/*,$PWD/mr-framework/hadoop-2.8.3/share/hadoop/common/*,$PWD/mr-framework/hadoop-2.8.3/share/hadoop/common/lib/*,$PWD/mr-framework/hadoop-2.8.3/share/hadoop/yarn/*,$PWD/mr-framework/hadoop-2.8.3/share/hadoop/yarn/lib/*,$PWD/mr-framework/hadoop-2.8.3/share/hadoop/hdfs/*,$PWD/mr-framework/hadoop-2.8.3/share/hadoop/hdfs/lib/*
  


{noformat}
Do I miss something?

> MR job (based on old version tarball) get failed due to incompatible resource 
> request
> -
>
> Key: YARN-7646
> URL: https://issues.apache.org/jira/browse/YARN-7646
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Junping Du
>Priority: Blocker
>
> With quick workaround with fixing HDFS-12920 (set non time unit to 
> hdfs-site.xml), the job still get failed with following error:
> {noformat}
> 2017-12-12 16:39:13,105 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=-1, maxMemory=8192
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:240)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:256)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:246)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:217)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:388)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:483)
>   at 
> 

[jira] [Commented] (YARN-7646) MR job (based on old version tarball) get failed due to incompatible resource request

2017-12-13 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289952#comment-16289952
 ] 

Junping Du commented on YARN-7646:
--

I was trying tarball of 2.9.0. Let me try 2.8.3 instead to see any differences.

> MR job (based on old version tarball) get failed due to incompatible resource 
> request
> -
>
> Key: YARN-7646
> URL: https://issues.apache.org/jira/browse/YARN-7646
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Junping Du
>Priority: Blocker
>
> With quick workaround with fixing HDFS-12920 (set non time unit to 
> hdfs-site.xml), the job still get failed with following error:
> {noformat}
> 2017-12-12 16:39:13,105 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=-1, maxMemory=8192
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:240)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:256)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:246)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:217)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:388)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:483)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy81.allocate(Unknown Source)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.makeRemoteRequest(RMContainerRequestor.java:206)
>   at 
> 

[jira] [Commented] (YARN-7646) MR job (based on old version tarball) get failed due to incompatible resource request

2017-12-12 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288566#comment-16288566
 ] 

Junping Du commented on YARN-7646:
--

CC [~andrew.wang] [~vinodkv], [~jlowe].

> MR job (based on old version tarball) get failed due to incompatible resource 
> request
> -
>
> Key: YARN-7646
> URL: https://issues.apache.org/jira/browse/YARN-7646
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Junping Du
>Priority: Blocker
>
> With quick workaround with fixing HDFS-12920 (set non time unit to 
> hdfs-site.xml), the job still get failed with following error:
> {noformat}
> 2017-12-12 16:39:13,105 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=-1, maxMemory=8192
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:240)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:256)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:246)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:217)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:388)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75)
>   at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:483)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy81.allocate(Unknown Source)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.makeRemoteRequest(RMContainerRequestor.java:206)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:783)
>   

[jira] [Created] (YARN-7646) MR job (based on old version tarball) get failed due to incompatible resource request

2017-12-12 Thread Junping Du (JIRA)
Junping Du created YARN-7646:


 Summary: MR job (based on old version tarball) get failed due to 
incompatible resource request
 Key: YARN-7646
 URL: https://issues.apache.org/jira/browse/YARN-7646
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: Junping Du
Priority: Blocker


With quick workaround with fixing HDFS-12920 (set non time unit to 
hdfs-site.xml), the job still get failed with following error:
{noformat}
2017-12-12 16:39:13,105 ERROR [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request, requested memory < 0, or requested memory > max configured, 
requestedMemory=-1, maxMemory=8192
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:275)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:240)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:256)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:246)
at 
org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92)
at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:388)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy81.allocate(Unknown Source)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.makeRemoteRequest(RMContainerRequestor.java:206)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:783)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:280)
at 
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:279)
at java.lang.Thread.run(Thread.java:745)
{noformat}
It looks like incompatible change with communication between old MR client 

[jira] [Commented] (YARN-7496) CS Intra-queue preemption user-limit calculations are not in line with LeafQueue user-limit calculations

2017-12-04 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277801#comment-16277801
 ] 

Junping Du commented on YARN-7496:
--

Merge to branch-2.8.3 given we previously set fix version to 2.8.3.

> CS Intra-queue preemption user-limit calculations are not in line with 
> LeafQueue user-limit calculations
> 
>
> Key: YARN-7496
> URL: https://issues.apache.org/jira/browse/YARN-7496
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.2
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 2.8.3
>
> Attachments: YARN-7496.001.branch-2.8.patch
>
>
> Only a problem in 2.8.
> Preemption could oscillate due to the difference in how user limit is 
> calculated between 2.8 and later releases.
> Basically (ignoring ULF, MULP, and maybe others), the calculation for user 
> limit on the Capacity Scheduler side in 2.8 is {{total used resources / 
> number of active users}} while the calculation in later releases is {{total 
> active resources / number of active users}}. When intra-queue preemption was 
> backported to 2.8, it's calculations for user limit were more aligned with 
> the latter algorithm, which is in 2.9 and later releases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7469) Capacity Scheduler Intra-queue preemption: User can starve if newest app is exactly at user limit

2017-12-04 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277799#comment-16277799
 ] 

Junping Du commented on YARN-7469:
--

Merge to branch-2.8.3 given we previously set fix version to 2.8.3.

> Capacity Scheduler Intra-queue preemption: User can starve if newest app is 
> exactly at user limit
> -
>
> Key: YARN-7469
> URL: https://issues.apache.org/jira/browse/YARN-7469
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 2.8.3, 3.0.0, 3.1.0, 2.10.0, 2.9.1
>
> Attachments: UnitTestToShowStarvedUser.patch, YARN-7469.001.patch
>
>
> Queue Configuration:
> - Total Memory: 20GB
> - 2 Queues
> -- Queue1
> --- Memory: 10GB
> --- MULP: 10%
> --- ULF: 2.0
> - Minimum Container Size: 0.5GB
> Use Case:
> - User1 submits app1 to Queue1 and consumes 20GB
> - User2 submits app2 to Queue1 and requests 7.5GB
> - Preemption monitor preempts 7.5GB from app1. Capacity Scheduler gives those 
> resources to User2
> - User 3 submits app3 to Queue1. To begin with, app3 is requesting 1 
> container for the AM.
> - Preemption monitor never preempts a container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7586) Application Placement should be done before ACL checks in ResourceManager

2017-11-30 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-7586:
-
Target Version/s: 3.1.0

> Application Placement should be done before ACL checks in ResourceManager
> -
>
> Key: YARN-7586
> URL: https://issues.apache.org/jira/browse/YARN-7586
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Blocker
> Attachments: YARN-7586.1.patch
>
>
> YARN-7419 moved Application placement from RMAppManager to RMAppImpl which 
> causes issues since ApplicationSubmissionContext still has the original queue 
> specified by the user and not the mapped queue . This causes issues while 
> doing ACL checks in RMAppManager



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7558) "yarn logs" command fails to get logs for running containers if UI authentication is enabled.

2017-11-29 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271586#comment-16271586
 ] 

Junping Du commented on YARN-7558:
--

Thanks for reply, Xuan! Agree that this is not easy to add a UT and the fix 
looks straightforward. +1. Will commit it shortly if no further comments.

> "yarn logs" command fails to get logs for running containers if UI 
> authentication is enabled.
> -
>
> Key: YARN-7558
> URL: https://issues.apache.org/jira/browse/YARN-7558
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-7558.1.patch, YARN-7558.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7558) "yarn logs" command fails to get logs for running containers if UI authentication is enabled.

2017-11-28 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269672#comment-16269672
 ] 

Junping Du commented on YARN-7558:
--

Thanks [~nmaheshwari] for reporting the issue and [~xgong] for delivering a 
patch. The patch looks OK to me in general. [~xgong], is it possible to add a 
UT to cover this case?

> "yarn logs" command fails to get logs for running containers if UI 
> authentication is enabled.
> -
>
> Key: YARN-7558
> URL: https://issues.apache.org/jira/browse/YARN-7558
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Namit Maheshwari
>Assignee: Xuan Gong
>Priority: Critical
> Attachments: YARN-7558.1.patch, YARN-7558.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7558) YARN log command fails to get logs for running containers if the url authentication is enabled.

2017-11-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-7558:
-
Target Version/s: 2.9.1, 3.0.1

> YARN log command fails to get logs for running containers if the url 
> authentication is enabled.
> ---
>
> Key: YARN-7558
> URL: https://issues.apache.org/jira/browse/YARN-7558
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-914) (Umbrella) Support graceful decommission of nodemanager

2017-11-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256106#comment-16256106
 ] 

Junping Du commented on YARN-914:
-

Client side graceful decommission work has been done together with proper 
document, so we should claim part of goal is achieved. I think we should 
separate server side decommission work into phase two with fixing HA issues, 
Jason Format issues, and other enhancements, which is helpful to make list 
cleaner. If nobody against, I will create a new Umbrella jira (and new branch) 
for moving all open JIRAs to under that one.

> (Umbrella) Support graceful decommission of nodemanager
> ---
>
> Key: YARN-914
> URL: https://issues.apache.org/jira/browse/YARN-914
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: graceful
>Affects Versions: 2.0.4-alpha
>Reporter: Luke Lu
>Assignee: Junping Du
> Attachments: Gracefully Decommission of NodeManager (v1).pdf, 
> Gracefully Decommission of NodeManager (v2).pdf, 
> GracefullyDecommissionofNodeManagerv3.pdf
>
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2017-11-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256093#comment-16256093
 ] 

Junping Du commented on YARN-5464:
--

I am working on a design doc for this, will upload for review soon.

> Server-Side NM Graceful Decommissioning with RM HA
> --
>
> Key: YARN-5464
> URL: https://issues.apache.org/jira/browse/YARN-5464
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Robert Kanter
>Assignee: Junping Du
> Attachments: YARN-5464.wip.patch
>
>
> Make sure to remove the note added by YARN-7094 about RM HA failover not 
> working right.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2017-11-16 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-5464:


Assignee: Junping Du

> Server-Side NM Graceful Decommissioning with RM HA
> --
>
> Key: YARN-5464
> URL: https://issues.apache.org/jira/browse/YARN-5464
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Robert Kanter
>Assignee: Junping Du
> Attachments: YARN-5464.wip.patch
>
>
> Make sure to remove the note added by YARN-7094 about RM HA failover not 
> working right.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7517) RegistryDNS does not work with secure ZooKeeper

2017-11-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256033#comment-16256033
 ] 

Junping Du commented on YARN-7517:
--

CC [~jianhe], [~billie.rinaldi].

> RegistryDNS does not work with secure ZooKeeper
> ---
>
> Key: YARN-7517
> URL: https://issues.apache.org/jira/browse/YARN-7517
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Eric Yang
>
> Jaas configuration is missing when RegistryDNS attempts to talk to secure 
> ZooKeeper.  This task is to generate the Client Jaas configuration to talk to 
> secure ZooKeeper.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6078) Containers stuck in Localizing state

2017-11-14 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6078:
-
Fix Version/s: 2.10.0

> Containers stuck in Localizing state
> 
>
> Key: YARN-6078
> URL: https://issues.apache.org/jira/browse/YARN-6078
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jagadish
>Assignee: Billie Rinaldi
> Fix For: 3.0.0, 3.1.0, 2.10.0, 2.9.1
>
> Attachments: YARN-6078-branch-2.001.patch, YARN-6078.001.patch, 
> YARN-6078.002.patch, YARN-6078.003.patch
>
>
> I encountered an interesting issue in one of our Yarn clusters (where the 
> containers are stuck in localizing phase).
> Our AM requests a container, and starts a process using the NMClient.
> According to the NM the container is in LOCALIZING state:
> {code}
> 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] 
> container.ContainerImpl.handle(ContainerImpl.java:1135) - Container 
> container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING
> 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] 
> localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711)
>  - Created localizer for container_e03_1481261762048_0541_02_60
> 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for 
> container_e03_1481261762048_0541_02_60] 
> localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191)
>  - Writing credentials to the nmPrivate file 
> /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. 
> Credentials list:
> {code}
> According to the RM the container is in RUNNING state:
> {code}
> 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] 
> rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - 
> container_e03_1481261762048_0541_02_60 Container Transitioned from 
> ALLOCATED to ACQUIRED
> 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] 
> rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - 
> container_e03_1481261762048_0541_02_60 Container Transitioned from 
> ACQUIRED to RUNNING
> {code}
> When I click the Yarn RM UI to view the logs for the container,  I get an 
> error
> that
> {code}
> No logs were found. state is LOCALIZING
> {code}
> The Node manager 's stack trace seems to indicate that the NM's 
> LocalizerRunner is stuck waiting to read from the sub-process's outputstream.
> {code}
> "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 
> prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable 
> [0x7fa5076c3000]
>java.lang.Thread.State: RUNNABLE
>   at java.io.FileInputStream.readBytes(Native Method)
>   at java.io.FileInputStream.read(FileInputStream.java:255)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   - locked <0xc6dc9c50> (a 
> java.lang.UNIXProcess$ProcessPipeInputStream)
>   at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>   at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>   at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>   - locked <0xc6dc9c78> (a java.io.InputStreamReader)
>   at java.io.InputStreamReader.read(InputStreamReader.java:184)
>   at java.io.BufferedReader.fill(BufferedReader.java:161)
>   at java.io.BufferedReader.read1(BufferedReader.java:212)
>   at java.io.BufferedReader.read(BufferedReader.java:286)
>   - locked <0xc6dc9c78> (a java.io.InputStreamReader)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786)
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:568)
>   at org.apache.hadoop.util.Shell.run(Shell.java:479)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1113)
> {code}
> I did a {code}ps aux{code} and confirmed that there was no container-executor 
> process running with INITIALIZE_CONTAINER that the localizer starts. It seems 
> that the output stream pipe of the process is still not closed (even though 
> the localizer process is no longer present).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6078) Containers stuck in Localizing state

2017-11-14 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252163#comment-16252163
 ] 

Junping Du commented on YARN-6078:
--

+1 on branch-2 patch. I have commit the patch to trunk, branch-3.0, branch-2 
and branch-2.9. Thanks [~billie.rinaldi] for the patch and [~bibinchundatt] for 
review!

> Containers stuck in Localizing state
> 
>
> Key: YARN-6078
> URL: https://issues.apache.org/jira/browse/YARN-6078
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jagadish
>Assignee: Billie Rinaldi
> Fix For: 3.0.0, 3.1.0, 2.10.0, 2.9.1
>
> Attachments: YARN-6078-branch-2.001.patch, YARN-6078.001.patch, 
> YARN-6078.002.patch, YARN-6078.003.patch
>
>
> I encountered an interesting issue in one of our Yarn clusters (where the 
> containers are stuck in localizing phase).
> Our AM requests a container, and starts a process using the NMClient.
> According to the NM the container is in LOCALIZING state:
> {code}
> 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] 
> container.ContainerImpl.handle(ContainerImpl.java:1135) - Container 
> container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING
> 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] 
> localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711)
>  - Created localizer for container_e03_1481261762048_0541_02_60
> 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for 
> container_e03_1481261762048_0541_02_60] 
> localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191)
>  - Writing credentials to the nmPrivate file 
> /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. 
> Credentials list:
> {code}
> According to the RM the container is in RUNNING state:
> {code}
> 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] 
> rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - 
> container_e03_1481261762048_0541_02_60 Container Transitioned from 
> ALLOCATED to ACQUIRED
> 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] 
> rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - 
> container_e03_1481261762048_0541_02_60 Container Transitioned from 
> ACQUIRED to RUNNING
> {code}
> When I click the Yarn RM UI to view the logs for the container,  I get an 
> error
> that
> {code}
> No logs were found. state is LOCALIZING
> {code}
> The Node manager 's stack trace seems to indicate that the NM's 
> LocalizerRunner is stuck waiting to read from the sub-process's outputstream.
> {code}
> "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 
> prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable 
> [0x7fa5076c3000]
>java.lang.Thread.State: RUNNABLE
>   at java.io.FileInputStream.readBytes(Native Method)
>   at java.io.FileInputStream.read(FileInputStream.java:255)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   - locked <0xc6dc9c50> (a 
> java.lang.UNIXProcess$ProcessPipeInputStream)
>   at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>   at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>   at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>   - locked <0xc6dc9c78> (a java.io.InputStreamReader)
>   at java.io.InputStreamReader.read(InputStreamReader.java:184)
>   at java.io.BufferedReader.fill(BufferedReader.java:161)
>   at java.io.BufferedReader.read1(BufferedReader.java:212)
>   at java.io.BufferedReader.read(BufferedReader.java:286)
>   - locked <0xc6dc9c78> (a java.io.InputStreamReader)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786)
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:568)
>   at org.apache.hadoop.util.Shell.run(Shell.java:479)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1113)
> {code}
> I did a {code}ps aux{code} and confirmed that there was no container-executor 
> process running with INITIALIZE_CONTAINER that the localizer starts. It seems 
> that the output stream pipe of the process is still not closed (even though 
> the localizer process is no longer present).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (YARN-6078) Containers stuck in Localizing state

2017-11-14 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6078:
-
Fix Version/s: 2.9.1

> Containers stuck in Localizing state
> 
>
> Key: YARN-6078
> URL: https://issues.apache.org/jira/browse/YARN-6078
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jagadish
>Assignee: Billie Rinaldi
> Fix For: 3.0.0, 3.1.0, 2.9.1
>
> Attachments: YARN-6078-branch-2.001.patch, YARN-6078.001.patch, 
> YARN-6078.002.patch, YARN-6078.003.patch
>
>
> I encountered an interesting issue in one of our Yarn clusters (where the 
> containers are stuck in localizing phase).
> Our AM requests a container, and starts a process using the NMClient.
> According to the NM the container is in LOCALIZING state:
> {code}
> 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] 
> container.ContainerImpl.handle(ContainerImpl.java:1135) - Container 
> container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING
> 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] 
> localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711)
>  - Created localizer for container_e03_1481261762048_0541_02_60
> 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for 
> container_e03_1481261762048_0541_02_60] 
> localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191)
>  - Writing credentials to the nmPrivate file 
> /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. 
> Credentials list:
> {code}
> According to the RM the container is in RUNNING state:
> {code}
> 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] 
> rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - 
> container_e03_1481261762048_0541_02_60 Container Transitioned from 
> ALLOCATED to ACQUIRED
> 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] 
> rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - 
> container_e03_1481261762048_0541_02_60 Container Transitioned from 
> ACQUIRED to RUNNING
> {code}
> When I click the Yarn RM UI to view the logs for the container,  I get an 
> error
> that
> {code}
> No logs were found. state is LOCALIZING
> {code}
> The Node manager 's stack trace seems to indicate that the NM's 
> LocalizerRunner is stuck waiting to read from the sub-process's outputstream.
> {code}
> "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 
> prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable 
> [0x7fa5076c3000]
>java.lang.Thread.State: RUNNABLE
>   at java.io.FileInputStream.readBytes(Native Method)
>   at java.io.FileInputStream.read(FileInputStream.java:255)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   - locked <0xc6dc9c50> (a 
> java.lang.UNIXProcess$ProcessPipeInputStream)
>   at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>   at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>   at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>   - locked <0xc6dc9c78> (a java.io.InputStreamReader)
>   at java.io.InputStreamReader.read(InputStreamReader.java:184)
>   at java.io.BufferedReader.fill(BufferedReader.java:161)
>   at java.io.BufferedReader.read1(BufferedReader.java:212)
>   at java.io.BufferedReader.read(BufferedReader.java:286)
>   - locked <0xc6dc9c78> (a java.io.InputStreamReader)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786)
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:568)
>   at org.apache.hadoop.util.Shell.run(Shell.java:479)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1113)
> {code}
> I did a {code}ps aux{code} and confirmed that there was no container-executor 
> process running with INITIALIZE_CONTAINER that the localizer starts. It seems 
> that the output stream pipe of the process is still not closed (even though 
> the localizer process is no longer present).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6078) Containers stuck in Localizing state

2017-11-13 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250478#comment-16250478
 ] 

Junping Du commented on YARN-6078:
--

Thanks [~bibinchundatt] for review and comments. +1 on 03 patch as well. Bump 
up the priority to critical given we have hit this problem with serious impact. 
Committing.

> Containers stuck in Localizing state
> 
>
> Key: YARN-6078
> URL: https://issues.apache.org/jira/browse/YARN-6078
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jagadish
>Assignee: Billie Rinaldi
> Attachments: YARN-6078.001.patch, YARN-6078.002.patch, 
> YARN-6078.003.patch
>
>
> I encountered an interesting issue in one of our Yarn clusters (where the 
> containers are stuck in localizing phase).
> Our AM requests a container, and starts a process using the NMClient.
> According to the NM the container is in LOCALIZING state:
> {code}
> 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] 
> container.ContainerImpl.handle(ContainerImpl.java:1135) - Container 
> container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING
> 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] 
> localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711)
>  - Created localizer for container_e03_1481261762048_0541_02_60
> 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for 
> container_e03_1481261762048_0541_02_60] 
> localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191)
>  - Writing credentials to the nmPrivate file 
> /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. 
> Credentials list:
> {code}
> According to the RM the container is in RUNNING state:
> {code}
> 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] 
> rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - 
> container_e03_1481261762048_0541_02_60 Container Transitioned from 
> ALLOCATED to ACQUIRED
> 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] 
> rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - 
> container_e03_1481261762048_0541_02_60 Container Transitioned from 
> ACQUIRED to RUNNING
> {code}
> When I click the Yarn RM UI to view the logs for the container,  I get an 
> error
> that
> {code}
> No logs were found. state is LOCALIZING
> {code}
> The Node manager 's stack trace seems to indicate that the NM's 
> LocalizerRunner is stuck waiting to read from the sub-process's outputstream.
> {code}
> "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 
> prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable 
> [0x7fa5076c3000]
>java.lang.Thread.State: RUNNABLE
>   at java.io.FileInputStream.readBytes(Native Method)
>   at java.io.FileInputStream.read(FileInputStream.java:255)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   - locked <0xc6dc9c50> (a 
> java.lang.UNIXProcess$ProcessPipeInputStream)
>   at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>   at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>   at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>   - locked <0xc6dc9c78> (a java.io.InputStreamReader)
>   at java.io.InputStreamReader.read(InputStreamReader.java:184)
>   at java.io.BufferedReader.fill(BufferedReader.java:161)
>   at java.io.BufferedReader.read1(BufferedReader.java:212)
>   at java.io.BufferedReader.read(BufferedReader.java:286)
>   - locked <0xc6dc9c78> (a java.io.InputStreamReader)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786)
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:568)
>   at org.apache.hadoop.util.Shell.run(Shell.java:479)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1113)
> {code}
> I did a {code}ps aux{code} and confirmed that there was no container-executor 
> process running with INITIALIZE_CONTAINER that the localizer starts. It seems 
> that the output stream pipe of the process is still not closed (even though 
> the localizer process is no longer present).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: 

[jira] [Commented] (YARN-6078) Containers stuck in Localizing state

2017-11-09 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246713#comment-16246713
 ] 

Junping Du commented on YARN-6078:
--

Thanks [~billie.rinaldi] for updating the patch! A quick question here:
bq. The new patch only propagates the interrupt when a shell hasn't 
successfully been destroyed.
What's impact for {{super.interrupt();}} in case shell process get destroyed? 
Like you said, it may prevent the rest of the cleanup from being performed for 
process destroying. Any side effect if we totally skip this?

Other than this question, the patch looks good to me.

> Containers stuck in Localizing state
> 
>
> Key: YARN-6078
> URL: https://issues.apache.org/jira/browse/YARN-6078
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jagadish
>Assignee: Billie Rinaldi
> Attachments: YARN-6078.001.patch, YARN-6078.002.patch
>
>
> I encountered an interesting issue in one of our Yarn clusters (where the 
> containers are stuck in localizing phase).
> Our AM requests a container, and starts a process using the NMClient.
> According to the NM the container is in LOCALIZING state:
> {code}
> 1. 2017-01-09 22:06:18,362 [INFO] [AsyncDispatcher event handler] 
> container.ContainerImpl.handle(ContainerImpl.java:1135) - Container 
> container_e03_1481261762048_0541_02_60 transitioned from NEW to LOCALIZING
> 2017-01-09 22:06:18,363 [INFO] [AsyncDispatcher event handler] 
> localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:711)
>  - Created localizer for container_e03_1481261762048_0541_02_60
> 2017-01-09 22:06:18,364 [INFO] [LocalizerRunner for 
> container_e03_1481261762048_0541_02_60] 
> localizer.ResourceLocalizationService$LocalizerRunner.writeCredentials(ResourceLocalizationService.java:1191)
>  - Writing credentials to the nmPrivate file 
> /../..//.nmPrivate/container_e03_1481261762048_0541_02_60.tokens. 
> Credentials list:
> {code}
> According to the RM the container is in RUNNING state:
> {code}
> 2017-01-09 22:06:17,110 [INFO] [IPC Server handler 19 on 8030] 
> rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - 
> container_e03_1481261762048_0541_02_60 Container Transitioned from 
> ALLOCATED to ACQUIRED
> 2017-01-09 22:06:19,084 [INFO] [ResourceManager Event Processor] 
> rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:410) - 
> container_e03_1481261762048_0541_02_60 Container Transitioned from 
> ACQUIRED to RUNNING
> {code}
> When I click the Yarn RM UI to view the logs for the container,  I get an 
> error
> that
> {code}
> No logs were found. state is LOCALIZING
> {code}
> The Node manager 's stack trace seems to indicate that the NM's 
> LocalizerRunner is stuck waiting to read from the sub-process's outputstream.
> {code}
> "LocalizerRunner for container_e03_1481261762048_0541_02_60" #27007081 
> prio=5 os_prio=0 tid=0x7fa518849800 nid=0x15f7 runnable 
> [0x7fa5076c3000]
>java.lang.Thread.State: RUNNABLE
>   at java.io.FileInputStream.readBytes(Native Method)
>   at java.io.FileInputStream.read(FileInputStream.java:255)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>   - locked <0xc6dc9c50> (a 
> java.lang.UNIXProcess$ProcessPipeInputStream)
>   at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>   at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>   at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>   - locked <0xc6dc9c78> (a java.io.InputStreamReader)
>   at java.io.InputStreamReader.read(InputStreamReader.java:184)
>   at java.io.BufferedReader.fill(BufferedReader.java:161)
>   at java.io.BufferedReader.read1(BufferedReader.java:212)
>   at java.io.BufferedReader.read(BufferedReader.java:286)
>   - locked <0xc6dc9c78> (a java.io.InputStreamReader)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:786)
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:568)
>   at org.apache.hadoop.util.Shell.run(Shell.java:479)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:237)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1113)
> {code}
> I did a {code}ps aux{code} and confirmed that there was no container-executor 
> process running with INITIALIZE_CONTAINER that the localizer starts. It seems 
> that the output stream pipe of the process is still not 

[jira] [Commented] (YARN-5422) ContainerLocalizer log should be logged in separate log file.

2017-11-09 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1624#comment-1624
 ] 

Junping Du commented on YARN-5422:
--

CC [~suma.shivaprasad]. Suma, does your previous work of separating container 
log out can cover case here?

> ContainerLocalizer log should be logged in separate log file.
> -
>
> Key: YARN-5422
> URL: https://issues.apache.org/jira/browse/YARN-5422
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Affects Versions: 2.7.1
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>
> We should set the log4j for the ContainerLocalizer jvm. Currently it will use 
> the NM log4j and  it will log the logs in NM hadoop.log file.
> If NM user and application user is different, then ContainerLocalizer will 
> not be able to log in hadoop.log file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7400) incorrect log preview displayed in jobhistory server ui

2017-11-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236713#comment-16236713
 ] 

Junping Du commented on YARN-7400:
--

Thanks [~subru]. I forget to mark jira as fixed but have commit to 
trunk,branch-3.0, branch-2 and branch-2.9. Thanks all for the contribution!

> incorrect log preview displayed in jobhistory server ui
> ---
>
> Key: YARN-7400
> URL: https://issues.apache.org/jira/browse/YARN-7400
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.0, 3.0.0, 3.1.0
>Reporter: Santhosh B Gowda
>Assignee: Xuan Gong
>Priority: Major
> Fix For: 2.9.0, 3.0.0, 3.1.0
>
> Attachments: YARN-7400.1.patch
>
>
> In the job history server ui,  If we enable the new log format, the container 
> preview log is displayed incorrectly, for e.x launch_container.sh is 
> displaying stderr logs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7400) incorrect log preview displayed in jobhistory server ui

2017-10-31 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227508#comment-16227508
 ] 

Junping Du commented on YARN-7400:
--

Thanks [~xgong] for delivering the fix. The fix is straightforward, so it 
should be ok without unit test.
+1. Will commit it shortly.

> incorrect log preview displayed in jobhistory server ui
> ---
>
> Key: YARN-7400
> URL: https://issues.apache.org/jira/browse/YARN-7400
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.0, 3.0.0, 3.1.0
>Reporter: Santhosh B Gowda
>Assignee: Xuan Gong
> Attachments: YARN-7400.1.patch
>
>
> In the job history server ui,  If we enable the new log format, the container 
> preview log is displayed incorrectly, for e.x launch_container.sh is 
> displaying stderr logs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7400) incorrect log preview displayed in jobhistory server ui

2017-10-30 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225982#comment-16225982
 ] 

Junping Du commented on YARN-7400:
--

Hey [~subru], all new format log related patches get landed on 2.9, so we plan 
to enable new log format (and other log related enhancements) in 2.9 and users 
could start to use it since 2.9. 
Given this really belongs to a small fix, shouldn't actually block 2.9 release.

> incorrect log preview displayed in jobhistory server ui
> ---
>
> Key: YARN-7400
> URL: https://issues.apache.org/jira/browse/YARN-7400
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.0, 3.0.0, 3.1.0
>Reporter: Santhosh B Gowda
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-7400.1.patch
>
>
> In the job history server ui,  If we enable the new log format, the container 
> preview log is displayed incorrectly, for e.x launch_container.sh is 
> displaying stderr logs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7400) incorrect log preview displayed in jobhistory server ui

2017-10-30 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-7400:
-
Target Version/s: 2.9.0, 3.0.0, 3.1.0

> incorrect log preview displayed in jobhistory server ui
> ---
>
> Key: YARN-7400
> URL: https://issues.apache.org/jira/browse/YARN-7400
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.0, 3.0.0, 3.1.0
>Reporter: Santhosh B Gowda
>Assignee: Xuan Gong
>Priority: Blocker
>
> In the job history server ui,  If we enable the new log format, the container 
> preview log is displayed incorrectly, for e.x launch_container.sh is 
> displaying stderr logs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6930) Admins should be able to explicitly enable specific LinuxContainerRuntime in the NodeManager

2017-10-30 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225652#comment-16225652
 ] 

Junping Du commented on YARN-6930:
--

bq. It has to be explicitly turned on in 2.9. In 2.8 it was turned on by 
default.
If so, I agree that we should mark this fix in 2.9 as incompatible in case 
users upgrade from 2.8 to 2.9 with assumption that docker runtime is on by 
default.


> Admins should be able to explicitly enable specific LinuxContainerRuntime in 
> the NodeManager
> 
>
> Key: YARN-6930
> URL: https://issues.apache.org/jira/browse/YARN-6930
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Shane Kumpf
> Fix For: 2.9.0, 3.0.0-beta1, 2.8.2
>
> Attachments: YARN-6930.001.patch, YARN-6930.002.patch, 
> YARN-6930.003.patch, YARN-6930.004.patch, YARN-6930.005.patch, 
> YARN-6930.006.patch, YARN-6930.branch-2.001.patch, 
> YARN-6930.branch-2.002.patch, YARN-6930.branch-2.8.001.patch, 
> YARN-6930.branch-2.8.002.patch, YARN-6930.branch-2.8.2.001.patch
>
>
> Today, in the java land, all LinuxContainerRuntimes are always enabled when 
> using LinuxContainerExecutor and the user can simply invoke anything that 
> he/she wants - default, docker, java-sandbox.
> We should have a way for admins to explicitly enable only specific runtimes 
> that he/she decides for the cluster. And by default, we should have 
> everything other than the default one disabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6930) Admins should be able to explicitly enable specific LinuxContainerRuntime in the NodeManager

2017-10-30 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225274#comment-16225274
 ] 

Junping Du commented on YARN-6930:
--

Hi [~vvasudev], thanks for comments. It sounds like 2.8.2 and 2.9 have the same 
configuration from my quick check. Given 2.8.2 is the first stable release of 
2.8, it doesn't sounds incompatible to 2.9 release. Do I miss something here?

> Admins should be able to explicitly enable specific LinuxContainerRuntime in 
> the NodeManager
> 
>
> Key: YARN-6930
> URL: https://issues.apache.org/jira/browse/YARN-6930
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Shane Kumpf
> Fix For: 2.9.0, 3.0.0-beta1, 2.8.2
>
> Attachments: YARN-6930.001.patch, YARN-6930.002.patch, 
> YARN-6930.003.patch, YARN-6930.004.patch, YARN-6930.005.patch, 
> YARN-6930.006.patch, YARN-6930.branch-2.001.patch, 
> YARN-6930.branch-2.002.patch, YARN-6930.branch-2.8.001.patch, 
> YARN-6930.branch-2.8.002.patch, YARN-6930.branch-2.8.2.001.patch
>
>
> Today, in the java land, all LinuxContainerRuntimes are always enabled when 
> using LinuxContainerExecutor and the user can simply invoke anything that 
> he/she wants - default, docker, java-sandbox.
> We should have a way for admins to explicitly enable only specific runtimes 
> that he/she decides for the cluster. And by default, we should have 
> everything other than the default one disabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7230) Document DockerContainerRuntime for branch-2.8 with proper scope and claim as an experimental feature

2017-10-18 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-7230:
-
Fix Version/s: 2.8.2

> Document DockerContainerRuntime for branch-2.8 with proper scope and claim as 
> an experimental feature
> -
>
> Key: YARN-7230
> URL: https://issues.apache.org/jira/browse/YARN-7230
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.8.1
>Reporter: Junping Du
>Assignee: Shane Kumpf
>Priority: Blocker
>  Labels: ready-to-commit
> Fix For: 2.8.2
>
> Attachments: YARN-7230.branch-2.8.001.patch, 
> YARN-7230.branch-2.8.002.patch, YARN-7230.branch-2.8.003.patch
>
>
> YARN-5258 is to document new feature for docker container runtime which 
> already get checked in trunk/branch-2. We need a similar one for branch-2.8. 
> However, given we missed several patches, we need to define narrowed scope of 
> these feature/improvements which match with existing patches landed in 2.8. 
> Also, like YARN-6622, to document it as experimental.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7230) Document DockerContainerRuntime for branch-2.8 with proper scope and claim as an experimental feature

2017-10-17 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208254#comment-16208254
 ] 

Junping Du commented on YARN-7230:
--

Thanks [~dan...@cloudera.com] and [~ebadger] for review. The patch LGTM as 
well. Will go ahead to commit to 2.8.x branches if no further comments.

> Document DockerContainerRuntime for branch-2.8 with proper scope and claim as 
> an experimental feature
> -
>
> Key: YARN-7230
> URL: https://issues.apache.org/jira/browse/YARN-7230
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.8.1
>Reporter: Junping Du
>Assignee: Shane Kumpf
>Priority: Blocker
>  Labels: ready-to-commit
> Attachments: YARN-7230.branch-2.8.001.patch, 
> YARN-7230.branch-2.8.002.patch, YARN-7230.branch-2.8.003.patch
>
>
> YARN-5258 is to document new feature for docker container runtime which 
> already get checked in trunk/branch-2. We need a similar one for branch-2.8. 
> However, given we missed several patches, we need to define narrowed scope of 
> these feature/improvements which match with existing patches landed in 2.8. 
> Also, like YARN-6622, to document it as experimental.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7333) container-executor fails to remove entries from a directory that is not writable or executable

2017-10-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206731#comment-16206731
 ] 

Junping Du commented on YARN-7333:
--

Thanks [~jlowe] for the patch and [~nroberts] for review!

> container-executor fails to remove entries from a directory that is not 
> writable or executable
> --
>
> Key: YARN-7333
> URL: https://issues.apache.org/jira/browse/YARN-7333
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha1, 2.8.2
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Critical
> Fix For: 2.9.0, 2.8.2, 2.8.3, 3.0.0
>
> Attachments: YARN-7333.001.patch, YARN-7333.002.patch
>
>
> Similar to the situation from YARN-4594, container-executor will fail to 
> cleanup directories that do not have write and execute permissions for the 
> directory.  YARN-4594 fixed the scenario where the directory is not readable, 
> but it missed the case where we can open the directory but either not 
> traverse it (i.e.: no execute permission) or cannot remove entries from 
> within it (i.e.: no write permissions).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7124) LogAggregationTFileController deletes/renames while file is open

2017-10-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206360#comment-16206360
 ] 

Junping Du commented on YARN-7124:
--

Sorry for coming late on this as just back from vacation. The patch LGTM. +1. 
Will commit it shortly.

> LogAggregationTFileController deletes/renames while file is open
> 
>
> Key: YARN-7124
> URL: https://issues.apache.org/jira/browse/YARN-7124
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0, 3.0.0-beta1
>Reporter: Daryn Sharp
>Assignee: Jason Lowe
>Priority: Critical
> Attachments: YARN-7124.001.patch
>
>
> YARN-6288 changes the log aggregation writer to be an autoclosable.  
> Unfortunately the try-with-resources block for the writer will either rename 
> or delete the log while open.
> Assuming the NM's behavior is correct, deleting open files only results in 
> ominous WARNs in the nodemanager log and increases the rate of logging in the 
> NN when the implicit try-with-resource close fails.  These red herrings 
> complicate debugging efforts.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6570) No logs were found for running application, running container

2017-09-27 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16183683#comment-16183683
 ] 

Junping Du commented on YARN-6570:
--

Sorry for replying late as on vacation recently and thanks for the nice catch, 
[~jlowe]! I will update a new patch after coming back.

> No logs were found for running application, running container
> -
>
> Key: YARN-6570
> URL: https://issues.apache.org/jira/browse/YARN-6570
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-6570-branch-2.8.001.patch, 
> YARN-6570-branch-2.8.002.patch, YARN-6570.poc.patch, YARN-6570-v2.patch, 
> YARN-6570-v3.patch
>
>
> 1.Obtain running containers from the following CLI for running application:
>  yarn  container -list appattempt
> 2. Couldnot fetch logs 
> {code}
> Can not find any log file matching the pattern: ALL for the container
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7248) NM returns new SCHEDULED container status to older clients

2017-09-27 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16183654#comment-16183654
 ] 

Junping Du commented on YARN-7248:
--

I think we should fix this in 3.0-beta also given rolling upgrade from 2.x is 
the goal. CC [~andrew.wang].

> NM returns new SCHEDULED container status to older clients
> --
>
> Key: YARN-7248
> URL: https://issues.apache.org/jira/browse/YARN-7248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Jason Lowe
>Assignee: Arun Suresh
>Priority: Blocker
> Attachments: YARN-7248.001.patch, YARN-7248.002.patch, 
> YARN-7248.003.patch
>
>
> YARN-4597 added a new SCHEDULED container state and that state is returned to 
> clients when the container is localizing, etc.  However the client may be 
> running on an older software version that does not have the new SCHEDULED 
> state which could lead the client to crash on the unexpected container state 
> value or make incorrect assumptions like any state != NEW and != RUNNING must 
> be COMPLETED which was true in the older version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7248) NM returns new SCHEDULED container status to older clients

2017-09-27 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-7248:
-
Target Version/s: 2.9.0  (was: 2.9.0, 3.0.0-beta1)

> NM returns new SCHEDULED container status to older clients
> --
>
> Key: YARN-7248
> URL: https://issues.apache.org/jira/browse/YARN-7248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Jason Lowe
>Assignee: Arun Suresh
>Priority: Blocker
> Attachments: YARN-7248.001.patch, YARN-7248.002.patch, 
> YARN-7248.003.patch
>
>
> YARN-4597 added a new SCHEDULED container state and that state is returned to 
> clients when the container is localizing, etc.  However the client may be 
> running on an older software version that does not have the new SCHEDULED 
> state which could lead the client to crash on the unexpected container state 
> value or make incorrect assumptions like any state != NEW and != RUNNING must 
> be COMPLETED which was true in the older version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7248) NM returns new SCHEDULED container status to older clients

2017-09-27 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-7248:
-
Target Version/s: 2.9.0, 3.0.0-beta1  (was: 2.9.0)

> NM returns new SCHEDULED container status to older clients
> --
>
> Key: YARN-7248
> URL: https://issues.apache.org/jira/browse/YARN-7248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Jason Lowe
>Assignee: Arun Suresh
>Priority: Blocker
> Attachments: YARN-7248.001.patch, YARN-7248.002.patch, 
> YARN-7248.003.patch
>
>
> YARN-4597 added a new SCHEDULED container state and that state is returned to 
> clients when the container is localizing, etc.  However the client may be 
> running on an older software version that does not have the new SCHEDULED 
> state which could lead the client to crash on the unexpected container state 
> value or make incorrect assumptions like any state != NEW and != RUNNING must 
> be COMPLETED which was true in the older version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6509) Add a size threshold beyond which yarn logs will require a force option

2017-09-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16177580#comment-16177580
 ] 

Junping Du commented on YARN-6509:
--

Thanks [~xgong] for updating the patch. v3 patch looks good to me in overall. 
Only one concern here is: shall we make 10G default size to be configurable? If 
not, in future it will be incompatible change when we want to update this value 
to something different - unless we are quite confident on this value.
 Another minor issue is we should keep consistent of using "limit" instead of 
"limits" in naming and printed messages.
Other looks fine to me.

> Add a size threshold beyond which yarn logs will require a force option
> ---
>
> Key: YARN-6509
> URL: https://issues.apache.org/jira/browse/YARN-6509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Xuan Gong
> Fix For: 2.9.0
>
> Attachments: YARN-6509.1.patch, YARN-6509.2.patch, YARN-6509.3.patch
>
>
> An accidental fetch for a long running application can lead to scenario which 
> the large size of log can fill up a disk.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6623) Add support to turn off launching privileged containers in the container-executor

2017-09-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16177381#comment-16177381
 ] 

Junping Du commented on YARN-6623:
--

Hi [~vvasudev], do you agree with what we discussed in security alias thread 
that only fix a small (but very important) issue instead of backporting whole 
YARN-6623? If so, I will go ahead to file a new jira to unblock 2.8.2 release?

> Add support to turn off launching privileged containers in the 
> container-executor
> -
>
> Key: YARN-6623
> URL: https://issues.apache.org/jira/browse/YARN-6623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
> Attachments: YARN-6623.001.patch, YARN-6623.002.patch, 
> YARN-6623.003.patch, YARN-6623.004.patch, YARN-6623.005.patch, 
> YARN-6623.006.patch, YARN-6623.007.patch, YARN-6623.008.patch, 
> YARN-6623.009.patch, YARN-6623.010.patch
>
>
> Currently, launching privileged containers is controlled by the NM. We should 
> add a flag to the container-executor.cfg allowing admins to disable launching 
> privileged containers at the container-executor level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6570) No logs were found for running application, running container

2017-09-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6570:
-
Fix Version/s: (was: 2.8.2)
   2.8.3

> No logs were found for running application, running container
> -
>
> Key: YARN-6570
> URL: https://issues.apache.org/jira/browse/YARN-6570
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-beta1, 2.8.3, 3.1.0
>
> Attachments: YARN-6570-branch-2.8.001.patch, 
> YARN-6570-branch-2.8.002.patch, YARN-6570.poc.patch, YARN-6570-v2.patch, 
> YARN-6570-v3.patch
>
>
> 1.Obtain running containers from the following CLI for running application:
>  yarn  container -list appattempt
> 2. Couldnot fetch logs 
> {code}
> Can not find any log file matching the pattern: ALL for the container
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6570) No logs were found for running application, running container

2017-09-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16177219#comment-16177219
 ] 

Junping Du commented on YARN-6570:
--

Thanks [~xgong]. Correct it to 2.8.3 for landing on branch-2.8 only.

> No logs were found for running application, running container
> -
>
> Key: YARN-6570
> URL: https://issues.apache.org/jira/browse/YARN-6570
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-beta1, 2.8.3, 3.1.0
>
> Attachments: YARN-6570-branch-2.8.001.patch, 
> YARN-6570-branch-2.8.002.patch, YARN-6570.poc.patch, YARN-6570-v2.patch, 
> YARN-6570-v3.patch
>
>
> 1.Obtain running containers from the following CLI for running application:
>  yarn  container -list appattempt
> 2. Couldnot fetch logs 
> {code}
> Can not find any log file matching the pattern: ALL for the container
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6570) No logs were found for running application, running container

2017-09-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16177197#comment-16177197
 ] 

Junping Du commented on YARN-6570:
--

Findbug is not related to the patch here. I think we are good to go for 
branch-2.8. [~xgong], would you review it again?

> No logs were found for running application, running container
> -
>
> Key: YARN-6570
> URL: https://issues.apache.org/jira/browse/YARN-6570
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-beta1, 3.1.0
>
> Attachments: YARN-6570-branch-2.8.001.patch, 
> YARN-6570-branch-2.8.002.patch, YARN-6570.poc.patch, YARN-6570-v2.patch, 
> YARN-6570-v3.patch
>
>
> 1.Obtain running containers from the following CLI for running application:
>  yarn  container -list appattempt
> 2. Couldnot fetch logs 
> {code}
> Can not find any log file matching the pattern: ALL for the container
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6623) Add support to turn off launching privileged containers in the container-executor

2017-09-21 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175908#comment-16175908
 ] 

Junping Du commented on YARN-6623:
--

bq. I plan to commit the patch to trunk/branch-3.0 tomorrow if no more opposite 
opinions.
I think branch-2 also need this as all docker related patch should also get 
landed on branch-2. CC [~asuresh].

> Add support to turn off launching privileged containers in the 
> container-executor
> -
>
> Key: YARN-6623
> URL: https://issues.apache.org/jira/browse/YARN-6623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
> Attachments: YARN-6623.001.patch, YARN-6623.002.patch, 
> YARN-6623.003.patch, YARN-6623.004.patch, YARN-6623.005.patch, 
> YARN-6623.006.patch, YARN-6623.007.patch, YARN-6623.008.patch, 
> YARN-6623.009.patch, YARN-6623.010.patch
>
>
> Currently, launching privileged containers is controlled by the NM. We should 
> add a flag to the container-executor.cfg allowing admins to disable launching 
> privileged containers at the container-executor level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6570) No logs were found for running application, running container

2017-09-21 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175566#comment-16175566
 ] 

Junping Du commented on YARN-6570:
--

The failed tests are passed locally. Trigger another run for jenkins test.

> No logs were found for running application, running container
> -
>
> Key: YARN-6570
> URL: https://issues.apache.org/jira/browse/YARN-6570
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-beta1, 3.1.0
>
> Attachments: YARN-6570-branch-2.8.001.patch, 
> YARN-6570-branch-2.8.002.patch, YARN-6570.poc.patch, YARN-6570-v2.patch, 
> YARN-6570-v3.patch
>
>
> 1.Obtain running containers from the following CLI for running application:
>  yarn  container -list appattempt
> 2. Couldnot fetch logs 
> {code}
> Can not find any log file matching the pattern: ALL for the container
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6623) Add support to turn off launching privileged containers in the container-executor

2017-09-21 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175386#comment-16175386
 ] 

Junping Du commented on YARN-6623:
--

bq. We have already documented that the docker feature is alpha, not for 
production use (and documenting more). Given this, I don't think we should add 
more risk to 2.8.2.
That's also my initial thinking, but [~shaneku...@gmail.com] convinced me 
offline that this is important for 2.8.2 even as an alpha feature - indeed 
still alpha for 2.9 and 3.0 and seems to affect non-docker container runtime. 
So I change my mind to support this backport. [~vvasudev], 
[~shaneku...@gmail.com] and [~miklos.szeg...@cloudera.com], what do you guys 
think?

> Add support to turn off launching privileged containers in the 
> container-executor
> -
>
> Key: YARN-6623
> URL: https://issues.apache.org/jira/browse/YARN-6623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
> Attachments: YARN-6623.001.patch, YARN-6623.002.patch, 
> YARN-6623.003.patch, YARN-6623.004.patch, YARN-6623.005.patch, 
> YARN-6623.006.patch, YARN-6623.007.patch, YARN-6623.008.patch, 
> YARN-6623.009.patch, YARN-6623.010.patch
>
>
> Currently, launching privileged containers is controlled by the NM. We should 
> add a flag to the container-executor.cfg allowing admins to disable launching 
> privileged containers at the container-executor level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7118) AHS REST API can return NullPointerException

2017-09-20 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174047#comment-16174047
 ] 

Junping Du commented on YARN-7118:
--

Thanks [~Prabhu Joseph] for reporting the issue and [~billie.rinaldi] for 
delivering the fix. The fix is quite straightforward. +1. Will commit it 
shortly.

> AHS REST API can return NullPointerException
> 
>
> Key: YARN-7118
> URL: https://issues.apache.org/jira/browse/YARN-7118
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Prabhu Joseph
>Assignee: Billie Rinaldi
> Attachments: YARN-7118.01.patch
>
>
> ApplicationHistoryService REST Api returns NullPointerException
> {code}
> [prabhu@prabhu2 root]$ curl --negotiate -u: 'http:// IP>:8188/ws/v1/applicationhistory/apps?queue=test'
> {"exception":"NullPointerException","javaClassName":"java.lang.NullPointerException"}
> {code}
> TimelineServer logs shows below.
> {code}
> 2017-08-17 17:54:54,128 WARN  webapp.GenericExceptionHandler 
> (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.webapp.WebServices.getApps(WebServices.java:191)
> at 
> org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSWebServices.getApps(AHSWebServices.java:96)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
> at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7230) Document DockerContainerRuntime for branch-2.8 with proper scope and claim as an experimental feature

2017-09-20 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173891#comment-16173891
 ] 

Junping Du commented on YARN-7230:
--

Thanks [~shaneku...@gmail.com]!

> Document DockerContainerRuntime for branch-2.8 with proper scope and claim as 
> an experimental feature
> -
>
> Key: YARN-7230
> URL: https://issues.apache.org/jira/browse/YARN-7230
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.8.1
>Reporter: Junping Du
>Assignee: Shane Kumpf
>Priority: Blocker
>
> YARN-5258 is to document new feature for docker container runtime which 
> already get checked in trunk/branch-2. We need a similar one for branch-2.8. 
> However, given we missed several patches, we need to define narrowed scope of 
> these feature/improvements which match with existing patches landed in 2.8. 
> Also, like YARN-6622, to document it as experimental.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7230) Document DockerContainerRuntime for branch-2.8 with proper scope and claim as an experimental feature

2017-09-20 Thread Junping Du (JIRA)
Junping Du created YARN-7230:


 Summary: Document DockerContainerRuntime for branch-2.8 with 
proper scope and claim as an experimental feature
 Key: YARN-7230
 URL: https://issues.apache.org/jira/browse/YARN-7230
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.8.1
Reporter: Junping Du
Priority: Blocker


YARN-5258 is to document new feature for docker container runtime which already 
get checked in trunk/branch-2. We need a similar one for branch-2.8. However, 
given we missed several patches, we need to define narrowed scope of these 
feature/improvements which match with existing patches landed in 2.8. Also, 
like YARN-6622, to document it as experimental.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6623) Add support to turn off launching privileged containers in the container-executor

2017-09-20 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173795#comment-16173795
 ] 

Junping Du commented on YARN-6623:
--

Thanks [~vvasudev] for working hard on this effort and everyone for review! 
Shall we backport this to branch-2.8 as well, otherwise it could be a security 
hole for 2.8.2 as a production release?

> Add support to turn off launching privileged containers in the 
> container-executor
> -
>
> Key: YARN-6623
> URL: https://issues.apache.org/jira/browse/YARN-6623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
> Attachments: YARN-6623.001.patch, YARN-6623.002.patch, 
> YARN-6623.003.patch, YARN-6623.004.patch, YARN-6623.005.patch, 
> YARN-6623.006.patch, YARN-6623.007.patch, YARN-6623.008.patch, 
> YARN-6623.009.patch, YARN-6623.010.patch
>
>
> Currently, launching privileged containers is controlled by the NM. We should 
> add a flag to the container-executor.cfg allowing admins to disable launching 
> privileged containers at the container-executor level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7034) DefaultLinuxContainerRuntime and DockerLinuxContainerRuntime sends client environment variables to container-executor

2017-09-20 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173515#comment-16173515
 ] 

Junping Du commented on YARN-7034:
--

Mark it as blocker given discussions in security thread.

> DefaultLinuxContainerRuntime and DockerLinuxContainerRuntime sends client 
> environment variables to container-executor
> -
>
> Key: YARN-7034
> URL: https://issues.apache.org/jira/browse/YARN-7034
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Blocker
> Attachments: YARN-7034.000.patch, YARN-7034.001.patch, 
> YARN-7034.002.patch, YARN-7034.003.patch, YARN-7034.004.patch, 
> YARN-7034.005.patch, YARN-7034.006.patch, YARN-7034.branch-2.000.patch, 
> YARN-7034.branch-2.004.patch, YARN-7034.branch-2.005.patch, 
> YARN-7034.branch-2.006.patch, YARN-7034.branch-2.8.000.patch, 
> YARN-7034.branch-2.8.004.patch, YARN-7034.branch-2.8.005.patch, 
> YARN-7034.branch-2.8.006.patch
>
>
> This behavior is unnecessary since there is nothing that is used from the 
> environment right now. One option is to whitelist these variables before 
> passing them. Are there any known use cases for this to justify?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7034) DefaultLinuxContainerRuntime and DockerLinuxContainerRuntime sends client environment variables to container-executor

2017-09-20 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173514#comment-16173514
 ] 

Junping Du commented on YARN-7034:
--

Thanks [~miklos.szeg...@cloudera.com] for updating the patch and 
[~shaneku...@gmail.com] for verifying and review! The patch LGTM as well. Any 
additional comments from others? If not, I will go ahead to commit this within 
next 24 hours.

> DefaultLinuxContainerRuntime and DockerLinuxContainerRuntime sends client 
> environment variables to container-executor
> -
>
> Key: YARN-7034
> URL: https://issues.apache.org/jira/browse/YARN-7034
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Critical
> Attachments: YARN-7034.000.patch, YARN-7034.001.patch, 
> YARN-7034.002.patch, YARN-7034.003.patch, YARN-7034.004.patch, 
> YARN-7034.005.patch, YARN-7034.006.patch, YARN-7034.branch-2.000.patch, 
> YARN-7034.branch-2.004.patch, YARN-7034.branch-2.005.patch, 
> YARN-7034.branch-2.006.patch, YARN-7034.branch-2.8.000.patch, 
> YARN-7034.branch-2.8.004.patch, YARN-7034.branch-2.8.005.patch, 
> YARN-7034.branch-2.8.006.patch
>
>
> This behavior is unnecessary since there is nothing that is used from the 
> environment right now. One option is to whitelist these variables before 
> passing them. Are there any known use cases for this to justify?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7034) DefaultLinuxContainerRuntime and DockerLinuxContainerRuntime sends client environment variables to container-executor

2017-09-20 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-7034:
-
Priority: Blocker  (was: Critical)

> DefaultLinuxContainerRuntime and DockerLinuxContainerRuntime sends client 
> environment variables to container-executor
> -
>
> Key: YARN-7034
> URL: https://issues.apache.org/jira/browse/YARN-7034
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Blocker
> Attachments: YARN-7034.000.patch, YARN-7034.001.patch, 
> YARN-7034.002.patch, YARN-7034.003.patch, YARN-7034.004.patch, 
> YARN-7034.005.patch, YARN-7034.006.patch, YARN-7034.branch-2.000.patch, 
> YARN-7034.branch-2.004.patch, YARN-7034.branch-2.005.patch, 
> YARN-7034.branch-2.006.patch, YARN-7034.branch-2.8.000.patch, 
> YARN-7034.branch-2.8.004.patch, YARN-7034.branch-2.8.005.patch, 
> YARN-7034.branch-2.8.006.patch
>
>
> This behavior is unnecessary since there is nothing that is used from the 
> environment right now. One option is to whitelist these variables before 
> passing them. Are there any known use cases for this to justify?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7196) Fix finicky TestContainerManager tests

2017-09-19 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172576#comment-16172576
 ] 

Junping Du commented on YARN-7196:
--

Sorry for coming late. 002 patch is quite straightforward and LGTM in overall. 
The build failure shouldn't be related as my local build can pass. Also 
findbugs warning doesn't relate to the patch. The only NITs is checkstyle that 
we should remove two unused import. Will remove it in committing the patch.
+1. Will commit it shortly.

> Fix finicky TestContainerManager tests
> --
>
> Key: YARN-7196
> URL: https://issues.apache.org/jira/browse/YARN-7196
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-7196.002.patch, YARN-7196.patch
>
>
> The Testcase {{testContainerUpdateExecTypeGuaranteedToOpportunistic}} seem to 
> fail every once in a while. Maybe have to change the way the event is 
> triggered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6570) No logs were found for running application, running container

2017-09-19 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6570:
-
Attachment: YARN-6570-branch-2.8.002.patch

Fix unit test failure and check style warning in 002 patch for branch-2.8.

> No logs were found for running application, running container
> -
>
> Key: YARN-6570
> URL: https://issues.apache.org/jira/browse/YARN-6570
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-beta1, 3.1.0
>
> Attachments: YARN-6570-branch-2.8.001.patch, 
> YARN-6570-branch-2.8.002.patch, YARN-6570.poc.patch, YARN-6570-v2.patch, 
> YARN-6570-v3.patch
>
>
> 1.Obtain running containers from the following CLI for running application:
>  yarn  container -list appattempt
> 2. Couldnot fetch logs 
> {code}
> Can not find any log file matching the pattern: ALL for the container
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6570) No logs were found for running application, running container

2017-09-18 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170800#comment-16170800
 ] 

Junping Du commented on YARN-6570:
--

Submit patch to trigger jenkins report.

> No logs were found for running application, running container
> -
>
> Key: YARN-6570
> URL: https://issues.apache.org/jira/browse/YARN-6570
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-beta1, 3.1.0
>
> Attachments: YARN-6570-branch-2.8.001.patch, YARN-6570.poc.patch, 
> YARN-6570-v2.patch, YARN-6570-v3.patch
>
>
> 1.Obtain running containers from the following CLI for running application:
>  yarn  container -list appattempt
> 2. Couldnot fetch logs 
> {code}
> Can not find any log file matching the pattern: ALL for the container
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6570) No logs were found for running application, running container

2017-09-18 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6570:
-
Attachment: YARN-6570-branch-2.8.001.patch

Upload patch for branch-2.8. Basically, adding SCHEDULED container status 
before container get actually running which keep consistent with trunk/branch-2.

> No logs were found for running application, running container
> -
>
> Key: YARN-6570
> URL: https://issues.apache.org/jira/browse/YARN-6570
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-beta1, 3.1.0
>
> Attachments: YARN-6570-branch-2.8.001.patch, YARN-6570.poc.patch, 
> YARN-6570-v2.patch, YARN-6570-v3.patch
>
>
> 1.Obtain running containers from the following CLI for running application:
>  yarn  container -list appattempt
> 2. Couldnot fetch logs 
> {code}
> Can not find any log file matching the pattern: ALL for the container
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-6570) No logs were found for running application, running container

2017-09-18 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reopened YARN-6570:
--

> No logs were found for running application, running container
> -
>
> Key: YARN-6570
> URL: https://issues.apache.org/jira/browse/YARN-6570
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-beta1, 3.1.0
>
> Attachments: YARN-6570.poc.patch, YARN-6570-v2.patch, 
> YARN-6570-v3.patch
>
>
> 1.Obtain running containers from the following CLI for running application:
>  yarn  container -list appattempt
> 2. Couldnot fetch logs 
> {code}
> Can not find any log file matching the pattern: ALL for the container
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6570) No logs were found for running application, running container

2017-09-18 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170763#comment-16170763
 ] 

Junping Du commented on YARN-6570:
--

Thanks [~xgong] for review and commit the patch. I would like to upload a patch 
for branch-2.8 as well so will reopen the jira for Jenkins test.

> No logs were found for running application, running container
> -
>
> Key: YARN-6570
> URL: https://issues.apache.org/jira/browse/YARN-6570
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-beta1, 3.1.0
>
> Attachments: YARN-6570.poc.patch, YARN-6570-v2.patch, 
> YARN-6570-v3.patch
>
>
> 1.Obtain running containers from the following CLI for running application:
>  yarn  container -list appattempt
> 2. Couldnot fetch logs 
> {code}
> Can not find any log file matching the pattern: ALL for the container
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7196) Fix finicky TestContainerManager tests

2017-09-18 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170332#comment-16170332
 ] 

Junping Du commented on YARN-7196:
--

Hi [~asuresh], sure. please take it up. :) Sorry for missing your earlier 
comments.

> Fix finicky TestContainerManager tests
> --
>
> Key: YARN-7196
> URL: https://issues.apache.org/jira/browse/YARN-7196
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
> Attachments: YARN-7196.patch
>
>
> The Testcase {{testContainerUpdateExecTypeGuaranteedToOpportunistic}} seem to 
> fail every once in a while. Maybe have to change the way the event is 
> triggered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6570) No logs were found for running application, running container

2017-09-15 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168614#comment-16168614
 ] 

Junping Du commented on YARN-6570:
--

BTW, TestContainerManager failure get tracked in YARN-7196

> No logs were found for running application, running container
> -
>
> Key: YARN-6570
> URL: https://issues.apache.org/jira/browse/YARN-6570
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-6570.poc.patch, YARN-6570-v2.patch, 
> YARN-6570-v3.patch
>
>
> 1.Obtain running containers from the following CLI for running application:
>  yarn  container -list appattempt
> 2. Couldnot fetch logs 
> {code}
> Can not find any log file matching the pattern: ALL for the container
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7196) Fix finicky TestContainerManager tests

2017-09-15 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-7196:
-
Attachment: YARN-7196.patch

> Fix finicky TestContainerManager tests
> --
>
> Key: YARN-7196
> URL: https://issues.apache.org/jira/browse/YARN-7196
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
> Attachments: YARN-7196.patch
>
>
> The Testcase {{testContainerUpdateExecTypeGuaranteedToOpportunistic}} seem to 
> fail every once in a while. Maybe have to change the way the event is 
> triggered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7196) Fix finicky TestContainerManager tests

2017-09-15 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168600#comment-16168600
 ] 

Junping Du commented on YARN-7196:
--

It failed consistently now, from YARN-6570 to YARN-7034. Take a quick look, and 
find the failed reason is container never get chance to running due to queue 
number is 0:
{noformat}
2017-09-15 15:08:53,159 INFO  [NM ContainerManager dispatcher] 
scheduler.ContainerScheduler (ContainerScheduler.java:enqueueContainer(392)) - 
Opportunistic container [container_0__01_00] will not be queued at the 
NMsince max queue length [0] has been reached
2017-09-15 15:08:53,162 INFO  [NM ContainerManager dispatcher] 
container.ContainerImpl (ContainerImpl.java:handle(1893)) - Container 
container_0__01_00 transitioned from SCHEDULED to KILLING
{noformat}
Putting a simple patch to fix the test failure (and the log typo) here.

> Fix finicky TestContainerManager tests
> --
>
> Key: YARN-7196
> URL: https://issues.apache.org/jira/browse/YARN-7196
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: The Testcase 
> {{testContainerUpdateExecTypeGuaranteedToOpportunistic}} seem to fail every 
> once in a while. Maybe have to change the way the event is triggered.
>Reporter: Arun Suresh
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6570) No logs were found for running application, running container

2017-09-15 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6570:
-
Attachment: YARN-6570-v3.patch

Update the patch to fix the test failures in TestEventFlow. 
The failure of TestContainerManager is not related to the patch. From my 
verification, it get failed on trunk without patch here.
TestNodeManagerResync doesn't get failed in my local machine. It may just get 
intermittent failure and shouldn't related to patch here.

> No logs were found for running application, running container
> -
>
> Key: YARN-6570
> URL: https://issues.apache.org/jira/browse/YARN-6570
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-6570.poc.patch, YARN-6570-v2.patch, 
> YARN-6570-v3.patch
>
>
> 1.Obtain running containers from the following CLI for running application:
>  yarn  container -list appattempt
> 2. Couldnot fetch logs 
> {code}
> Can not find any log file matching the pattern: ALL for the container
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7174) Add retry logic in LogsCLI when fetch running application logs

2017-09-15 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168542#comment-16168542
 ] 

Junping Du commented on YARN-7174:
--

got it. v2 patch LGTM. +1. Will commit it shortly.

> Add retry logic in LogsCLI when fetch running application logs
> --
>
> Key: YARN-7174
> URL: https://issues.apache.org/jira/browse/YARN-7174
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-7174.1.patch, YARN-7174.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7174) Add retry logic in LogsCLI when fetch running application logs

2017-09-15 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-7174:
-
Target Version/s: 2.9.0, 3.1.0

> Add retry logic in LogsCLI when fetch running application logs
> --
>
> Key: YARN-7174
> URL: https://issues.apache.org/jira/browse/YARN-7174
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-7174.1.patch, YARN-7174.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7162) Remove XML excludes file format

2017-09-14 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167210#comment-16167210
 ] 

Junping Du commented on YARN-7162:
--

bq. Given that code freeze for Hadoop 3 beta 1 is tomorrow (Sept 15), perhaps 
we should compromise by removing this for now from branch-3.0 but leave it in 
trunk, and revisit this for 3.1.0?
+1. This sounds more reasonable. We may revisit this a couple of month from now 
to see what to do for XML format.

> Remove XML excludes file format
> ---
>
> Key: YARN-7162
> URL: https://issues.apache.org/jira/browse/YARN-7162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Affects Versions: 2.9.0, 3.0.0-beta1
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Blocker
> Attachments: YARN-7162.001.patch, YARN-7162.branch-2.001.patch
>
>
> YARN-5536 aims to replace the XML format for the excludes file with a JSON 
> format.  However, it looks like we won't have time for that for Hadoop 3 Beta 
> 1.  The concern is that if we release it as-is, we'll now have to support the 
> XML format as-is for all of Hadoop 3.x, which we're either planning on 
> removing, or rewriting using a pluggable framework.  
> [This comment in 
> YARN-5536|https://issues.apache.org/jira/browse/YARN-5536?focusedCommentId=16126194=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16126194]
>  proposed two quick solutions to prevent this compat issue.  In this JIRA, 
> we're going to remove the XML format.  If we later want to add it back in, 
> YARN-5536 can add it back, rewriting it to be in the pluggable framework.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7162) Remove XML excludes file format

2017-09-14 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167043#comment-16167043
 ] 

Junping Du commented on YARN-7162:
--

bq. Maybe we do want to support the old format, JSON, and XML, but we may need 
to tweak the XML format incompatibly depending on how the JSON format ends up 
and how the pluggable code for it works; but we'd be stuck if we make a release 
with the existing XML in that case. Or maybe once we have the JSON format we 
don't want to worry about the XML format. In any case, the safest thing to do 
is to remove the XML format now before the release.
JSON format is something Ming propose to have. I gave it a +1 because I think 
it is to replace/add XML format but not just simply kill XML one.
For me, to have JSON format support is just something nice to have but not 
must. The work I care more here is to verify its functionality and make it 
stabilized. The cases you listed above is missing an important case which is 
none of us has bandwidth to add JSON format back so we could totally lose this 
feature/functionality. I would prefer to call it an alpha feature instead of 
remove it which cause unnecessary feature regression.

> Remove XML excludes file format
> ---
>
> Key: YARN-7162
> URL: https://issues.apache.org/jira/browse/YARN-7162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Affects Versions: 2.9.0, 3.0.0-beta1
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Blocker
> Attachments: YARN-7162.001.patch, YARN-7162.branch-2.001.patch
>
>
> YARN-5536 aims to replace the XML format for the excludes file with a JSON 
> format.  However, it looks like we won't have time for that for Hadoop 3 Beta 
> 1.  The concern is that if we release it as-is, we'll now have to support the 
> XML format as-is for all of Hadoop 3.x, which we're either planning on 
> removing, or rewriting using a pluggable framework.  
> [This comment in 
> YARN-5536|https://issues.apache.org/jira/browse/YARN-5536?focusedCommentId=16126194=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16126194]
>  proposed two quick solutions to prevent this compat issue.  In this JIRA, 
> we're going to remove the XML format.  If we later want to add it back in, 
> YARN-5536 can add it back, rewriting it to be in the pluggable framework.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7162) Remove XML excludes file format

2017-09-14 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167024#comment-16167024
 ] 

Junping Du commented on YARN-7162:
--

bq.  if you wanted to have nodeA do 50, nodeB do 70, etc, you can't do that 
without the XML or JSON format. I'm not sure how useful that is.
It really depend on how important we think this timeout is. In the beginning, 
we think timeout value is not that matter, so we have a simple, client-side 
tracking on timeout values. But later on, [~danzhi] from AWS claim that timeout 
value matters a lot, and we need a more reliable way to track it - through RM. 
Also, multiple nodes has different timeout values is the case they want to 
cover because the precisely for each node's timeout may also be important. I 
can quickly think of some use cases, such as: admin could plan to decommission 
some nodes earlier than other nodes. Or some nodes's containers/applications 
(in node label case) are more important/critical than other nodes, so worth to 
wait a bit longer, etc.


> Remove XML excludes file format
> ---
>
> Key: YARN-7162
> URL: https://issues.apache.org/jira/browse/YARN-7162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Affects Versions: 2.9.0, 3.0.0-beta1
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Blocker
> Attachments: YARN-7162.001.patch, YARN-7162.branch-2.001.patch
>
>
> YARN-5536 aims to replace the XML format for the excludes file with a JSON 
> format.  However, it looks like we won't have time for that for Hadoop 3 Beta 
> 1.  The concern is that if we release it as-is, we'll now have to support the 
> XML format as-is for all of Hadoop 3.x, which we're either planning on 
> removing, or rewriting using a pluggable framework.  
> [This comment in 
> YARN-5536|https://issues.apache.org/jira/browse/YARN-5536?focusedCommentId=16126194=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16126194]
>  proposed two quick solutions to prevent this compat issue.  In this JIRA, 
> we're going to remove the XML format.  If we later want to add it back in, 
> YARN-5536 can add it back, rewriting it to be in the pluggable framework.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7162) Remove XML excludes file format

2017-09-14 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167004#comment-16167004
 ] 

Junping Du commented on YARN-7162:
--

wait... Sorry for coming late on this.
I think it could be better if we have JSON format support first then safely 
remove supporting XML. Any benefit we gain from this remove now? [~mingma] vote 
to remove XML format support but also to add a new format of JSON which hasn't 
been done first.

> Remove XML excludes file format
> ---
>
> Key: YARN-7162
> URL: https://issues.apache.org/jira/browse/YARN-7162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Affects Versions: 2.9.0, 3.0.0-beta1
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Blocker
> Attachments: YARN-7162.001.patch, YARN-7162.branch-2.001.patch
>
>
> YARN-5536 aims to replace the XML format for the excludes file with a JSON 
> format.  However, it looks like we won't have time for that for Hadoop 3 Beta 
> 1.  The concern is that if we release it as-is, we'll now have to support the 
> XML format as-is for all of Hadoop 3.x, which we're either planning on 
> removing, or rewriting using a pluggable framework.  
> [This comment in 
> YARN-5536|https://issues.apache.org/jira/browse/YARN-5536?focusedCommentId=16126194=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16126194]
>  proposed two quick solutions to prevent this compat issue.  In this JIRA, 
> we're going to remove the XML format.  If we later want to add it back in, 
> YARN-5536 can add it back, rewriting it to be in the pluggable framework.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >