[jira] [Commented] (YARN-9730) Support forcing configured partitions to be exclusive based on app node label

2019-09-25 Thread Zhe Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938150#comment-16938150
 ] 

Zhe Zhang commented on YARN-9730:
-

+1 on the addendum path

> Support forcing configured partitions to be exclusive based on app node label
> -
>
> Key: YARN-9730
> URL: https://issues.apache.org/jira/browse/YARN-9730
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>  Labels: release-blocker
> Fix For: 2.10.0, 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9730-branch-2.001.patch, YARN-9730.001.addendum, 
> YARN-9730.001.patch, YARN-9730.002.patch, YARN-9730.003.patch
>
>
> Use case: queue X has all of its workload in non-default (exclusive) 
> partition P (by setting app submission context's node label set to P). Node 
> in partition Q != P heartbeats to RM. Capacity scheduler loops through every 
> application in X, and every scheduler key in this application, and fails to 
> allocate each time since the app's requested label and the node's label don't 
> match. This causes huge performance degradation when number of apps in X is 
> large.
> To fix the issue, allow RM to configure partitions as "forced-exclusive". If 
> partition P is "forced-exclusive", then:
>  * 1a. If app sets its submission context's node label to P, all its resource 
> requests will be overridden to P
>  * 1b. If app sets its submission context's node label Q, any of its resource 
> requests whose labels are P will be overridden to Q
>  * 2. In the scheduler, we add apps with node label expression P to a 
> separate data structure. When a node in partition P heartbeats to scheduler, 
> we only try to schedule apps in this data structure. When a node in partition 
> Q heartbeats to scheduler, we schedule the rest of the apps as normal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8849) DynoYARN: A simulation and testing infrastructure for YARN clusters

2019-09-17 Thread Zhe Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931752#comment-16931752
 ] 

Zhe Zhang commented on YARN-8849:
-

[~jhung] has been continuing this work internally. We plan to share an update 
in near future.

> DynoYARN: A simulation and testing infrastructure for YARN clusters
> ---
>
> Key: YARN-8849
> URL: https://issues.apache.org/jira/browse/YARN-8849
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun Suresh
>Assignee: Jonathan Hung
>Priority: Major
>
> Traditionally, YARN workload simulation is performed using SLS (Scheduler 
> Load Simulator) which is packaged with YARN. It Essentially, starts a full 
> fledged *ResourceManager*, but runs simulators for the *NodeManager* and the 
> *ApplicationMaster* Containers. These simulators are lightweight and run in a 
> threadpool. The NM simulators do not open any external ports and send 
> (in-process) heartbeats to the ResourceManager.
> There are a couple of drawbacks with using the SLS:
>  * It might be difficult to simulate really large clusters without having 
> access to a very beefy box - since the NMs are launched as tasks in a 
> threadpool, and each NM has to send periodic heartbeats to the RM.
>  * Certain features (like YARN-1011) requires changes to the NodeManager - 
> aspects such as queuing and selectively killing containers have to be 
> incorporated into the existing NM Simulator which might make the simulator a 
> bit heavy weight - there is a need for locking and synchronization.
>  * Since the NM and AM are simulations, only the Scheduler is faithfully 
> tested - it does not really perform an end-2-end test of a cluster.
> Therefore, drawing inspiration from 
> [Dynamometer|https://github.com/linkedin/dynamometer], we propose a framework 
> for YARN deployable YARN cluster - *DynoYARN* - for testing, with the 
> following features:
>  * The NM already has hooks to plug-in custom *ContainerExecutor* and 
> *NodeResourceMonitor*. If we can plug-in a custom *ContainersMonitorImpl*'s 
> Monitoring thread (and other modules like the LocalizationService), We can 
> probably inject an Executor that does not actually launch containers and a 
> Node and Container resource monitor that reports synthetic pre-specified 
> Utilization metrics back to the RM.
>  * Since we are launching fake containers, we cannot run normal AM 
> containers. We can therefore, use *Unmanaged AM*'s to launch synthetic jobs.
> Essentially, a test workflow would look like this:
>  * Launch a DynoYARN cluster.
>  * Use the Unmanaged AM feature to directly negotiate with the DynaYARN 
> Resource Manager for container tokens.
>  * Use the container tokens from the RM to directly ask the DynoYARN Node 
> Managers to start fake containers.
>  * The DynoYARN NodeManagers will start the fake containers and report to the 
> DynoYARN Resource Manager synthetically generated resource utilization for 
> the containers (which will be injected via the *ContainerLaunchContext* and 
> parsed by the plugged-in Container Executor).
>  * The Scheduler will use the utilization report to schedule containers - we 
> will be able to test allocation of *Opportunistic* containers based on 
> resource utilization.
>  * Since the DynoYARN Node Managers run the actual code paths, all preemption 
> and queuing logic will be faithfully executed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8849) DynoYARN: A simulation and testing infrastructure for YARN clusters

2019-09-17 Thread Zhe Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reassigned YARN-8849:
---

Assignee: Jonathan Hung  (was: Keqiu Hu)

> DynoYARN: A simulation and testing infrastructure for YARN clusters
> ---
>
> Key: YARN-8849
> URL: https://issues.apache.org/jira/browse/YARN-8849
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun Suresh
>Assignee: Jonathan Hung
>Priority: Major
>
> Traditionally, YARN workload simulation is performed using SLS (Scheduler 
> Load Simulator) which is packaged with YARN. It Essentially, starts a full 
> fledged *ResourceManager*, but runs simulators for the *NodeManager* and the 
> *ApplicationMaster* Containers. These simulators are lightweight and run in a 
> threadpool. The NM simulators do not open any external ports and send 
> (in-process) heartbeats to the ResourceManager.
> There are a couple of drawbacks with using the SLS:
>  * It might be difficult to simulate really large clusters without having 
> access to a very beefy box - since the NMs are launched as tasks in a 
> threadpool, and each NM has to send periodic heartbeats to the RM.
>  * Certain features (like YARN-1011) requires changes to the NodeManager - 
> aspects such as queuing and selectively killing containers have to be 
> incorporated into the existing NM Simulator which might make the simulator a 
> bit heavy weight - there is a need for locking and synchronization.
>  * Since the NM and AM are simulations, only the Scheduler is faithfully 
> tested - it does not really perform an end-2-end test of a cluster.
> Therefore, drawing inspiration from 
> [Dynamometer|https://github.com/linkedin/dynamometer], we propose a framework 
> for YARN deployable YARN cluster - *DynoYARN* - for testing, with the 
> following features:
>  * The NM already has hooks to plug-in custom *ContainerExecutor* and 
> *NodeResourceMonitor*. If we can plug-in a custom *ContainersMonitorImpl*'s 
> Monitoring thread (and other modules like the LocalizationService), We can 
> probably inject an Executor that does not actually launch containers and a 
> Node and Container resource monitor that reports synthetic pre-specified 
> Utilization metrics back to the RM.
>  * Since we are launching fake containers, we cannot run normal AM 
> containers. We can therefore, use *Unmanaged AM*'s to launch synthetic jobs.
> Essentially, a test workflow would look like this:
>  * Launch a DynoYARN cluster.
>  * Use the Unmanaged AM feature to directly negotiate with the DynaYARN 
> Resource Manager for container tokens.
>  * Use the container tokens from the RM to directly ask the DynoYARN Node 
> Managers to start fake containers.
>  * The DynoYARN NodeManagers will start the fake containers and report to the 
> DynoYARN Resource Manager synthetically generated resource utilization for 
> the containers (which will be injected via the *ContainerLaunchContext* and 
> parsed by the plugged-in Container Executor).
>  * The Scheduler will use the utilization report to schedule containers - we 
> will be able to test allocation of *Opportunistic* containers based on 
> resource utilization.
>  * Since the DynoYARN Node Managers run the actual code paths, all preemption 
> and queuing logic will be faithfully executed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9409) Port resource type changes from YARN-7237 to branch-3.0/branch-2

2019-03-27 Thread Zhe Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803044#comment-16803044
 ] 

Zhe Zhang commented on YARN-9409:
-

+1

> Port resource type changes from YARN-7237 to branch-3.0/branch-2
> 
>
> Key: YARN-9409
> URL: https://issues.apache.org/jira/browse/YARN-9409
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9409-YARN-8200.branch3.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9272) Backport YARN-7738 for refreshing max allocation for multiple resource types

2019-03-27 Thread Zhe Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803039#comment-16803039
 ] 

Zhe Zhang commented on YARN-9272:
-

+1

> Backport YARN-7738 for refreshing max allocation for multiple resource types
> 
>
> Key: YARN-9272
> URL: https://issues.apache.org/jira/browse/YARN-9272
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9272-YARN-8200.001.patch, 
> YARN-9272-YARN-8200.branch3.001.patch, YARN-9272-YARN-8200.branch3.002.patch
>
>
> Need to port to YARN-8200.branch3 (for branch-3.0) and YARN-8200 (for 
> branch-2)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9291) Backport YARN-7637 to branch-2

2019-03-20 Thread Zhe Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797705#comment-16797705
 ] 

Zhe Zhang commented on YARN-9291:
-

+1

> Backport YARN-7637 to branch-2
> --
>
> Key: YARN-9291
> URL: https://issues.apache.org/jira/browse/YARN-9291
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9291-YARN-8200.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9271) Backport YARN-6927 for resource type support in MapReduce

2019-03-20 Thread Zhe Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797636#comment-16797636
 ] 

Zhe Zhang commented on YARN-9271:
-

+1. I attempted cherrypicking locally, and verified that indeed only the above 
3 Test files need minor Java8 related changes.

> Backport YARN-6927 for resource type support in MapReduce
> -
>
> Key: YARN-9271
> URL: https://issues.apache.org/jira/browse/YARN-9271
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9271-YARN-8200.001.patch
>
>
> This is already in branch-3.0. Need to port it to YARN-8200/branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9397) Fix empty NMResourceInfo object test failures in branch-2

2019-03-18 Thread Zhe Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795511#comment-16795511
 ] 

Zhe Zhang commented on YARN-9397:
-

+1, looks like a clean fix.

> Fix empty NMResourceInfo object test failures in branch-2
> -
>
> Key: YARN-9397
> URL: https://issues.apache.org/jira/browse/YARN-9397
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9397-YARN-8200.001.patch
>
>
> Appears the empty object handling behavior changed in jersey versions 
> (branch-2 is on jersey 1.9, branch-3 on 1.19).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9175) Null resources check in ResourceInfo for branch-3.0

2019-01-03 Thread Zhe Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733505#comment-16733505
 ] 

Zhe Zhang commented on YARN-9175:
-

+1. Thanks [~jhung] for the fix.

> Null resources check in ResourceInfo for branch-3.0
> ---
>
> Key: YARN-9175
> URL: https://issues.apache.org/jira/browse/YARN-9175
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9175-branch-3.0.001.patch
>
>
> Check for null {{resources}} in ResourceInfo class to avoid NPE when 
> rendering RM UI.
> This was fixed as part of YARN-7934 which didn't make it to 3.0 and below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9085) Guaranteed and MaxCapacity CSQueueMetrics

2018-12-07 Thread Zhe Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713165#comment-16713165
 ] 

Zhe Zhang commented on YARN-9085:
-

+1 (binding)

Thanks [~jhung]! Latest patch LGTM; I like the 
{{updateConfiguredCapacityMetrics}} structure.

> Guaranteed and MaxCapacity CSQueueMetrics
> -
>
> Key: YARN-9085
> URL: https://issues.apache.org/jira/browse/YARN-9085
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.3
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9085.001.patch, YARN-9085.002.patch
>
>
> Would be useful to have Absolute Capacity/Absolute Max Capacity for queues to 
> compare against allocated/pending/etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9085) Guaranteed and MaxCapacity CSQueueMetrics

2018-12-05 Thread Zhe Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711044#comment-16711044
 ] 

Zhe Zhang commented on YARN-9085:
-

Thanks [~jhung]! This is a useful metric. Patch LGTM overall. 

+1 pending the following couple of nits:

# Why does {{setGuaranteedResources}} take a {{partition}} argument if it's 
always supposed to be null?
# Since you are updating configured capacities now, should update the comment 
as well:
{code}
   * When nodePartition is null, all partition of
   * used-capacity/absolute-used-capacity will be updated
{code}

> Guaranteed and MaxCapacity CSQueueMetrics
> -
>
> Key: YARN-9085
> URL: https://issues.apache.org/jira/browse/YARN-9085
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.3
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9085.001.patch
>
>
> Would be useful to have Absolute Capacity/Absolute Max Capacity for queues to 
> compare against allocated/pending/etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7623) Fix the CapacityScheduler Queue configuration documentation

2018-03-27 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416315#comment-16416315
 ] 

Zhe Zhang commented on YARN-7623:
-

Thanks [~jhung]! Change looks good (I'm attaching the markdown from my IDE).

+1 after new Jenkins run.

> Fix the CapacityScheduler Queue configuration documentation
> ---
>
> Key: YARN-7623
> URL: https://issues.apache.org/jira/browse/YARN-7623
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun Suresh
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: Screen Shot 2018-03-27 at 3.02.45 PM.png, 
> YARN-7623.001.patch, YARN-7623.002.patch
>
>
> It looks like the [Changing Queue 
> Configuration|https://hadoop.apache.org/docs/r2.9.0/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html#Changing_queue_configuration_via_API]
>  section is mis-formatted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7623) Fix the CapacityScheduler Queue configuration documentation

2018-03-27 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated YARN-7623:

Attachment: Screen Shot 2018-03-27 at 3.02.45 PM.png

> Fix the CapacityScheduler Queue configuration documentation
> ---
>
> Key: YARN-7623
> URL: https://issues.apache.org/jira/browse/YARN-7623
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Arun Suresh
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: Screen Shot 2018-03-27 at 3.02.45 PM.png, 
> YARN-7623.001.patch, YARN-7623.002.patch
>
>
> It looks like the [Changing Queue 
> Configuration|https://hadoop.apache.org/docs/r2.9.0/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html#Changing_queue_configuration_via_API]
>  section is mis-formatted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7737) prelaunch.err file not found exception on container failure

2018-01-24 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated YARN-7737:

Hadoop Flags: Reviewed

Thanks [~jhung] for taking a look. I just committed to trunk. Backporting to 
other target versions now.

> prelaunch.err file not found exception on container failure
> ---
>
> Key: YARN-7737
> URL: https://issues.apache.org/jira/browse/YARN-7737
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0, 2.9.1, 3.0.1
>Reporter: Jonathan Hung
>Assignee: Keqiu Hu
>Priority: Major
> Attachments: YARN-7737.001.patch
>
>
> Hit this exception when a container failed:{noformat}2018-01-11 19:04:08,036 
> ERROR 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Failed to get tail of the container's prelaunch error log file
> java.io.FileNotFoundException: File 
> /grid/b/tmp/userlogs/application_1515190594800_1766/container_e39_1515190594800_1766_01_02/prelaunch.err
>  does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.handleContainerExitWithFailure(ContainerLaunch.java:545)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.handleContainerExitCode(ContainerLaunch.java:511)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:319)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:93)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){noformat}
> containerLogDir is picked on container launch via 
> {{LocalDirAllocator#getLocalPathForWrite}}, which is where it looks for 
> {{prelaunch.err}} when the container fails. But prelaunch.err (and 
> prelaunch.out) are created in the first log dir (in {{ContainerLaunch#call}}: 
> {noformat}exec.writeLaunchEnv(containerScriptOutStream, environment,
> localResources, launchContext.getCommands(),
> new Path(containerLogDirs.get(0)), user);{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7737) prelaunch.err file not found exception on container failure

2018-01-19 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332614#comment-16332614
 ] 

Zhe Zhang commented on YARN-7737:
-

+1, looks to me a clear fix. Will wait for [~jhung] to take a look before 
committing.

> prelaunch.err file not found exception on container failure
> ---
>
> Key: YARN-7737
> URL: https://issues.apache.org/jira/browse/YARN-7737
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Keqiu Hu
>Priority: Major
> Attachments: YARN-7737.001.patch
>
>
> Hit this exception when a container failed:{noformat}2018-01-11 19:04:08,036 
> ERROR 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Failed to get tail of the container's prelaunch error log file
> java.io.FileNotFoundException: File 
> /grid/b/tmp/userlogs/application_1515190594800_1766/container_e39_1515190594800_1766_01_02/prelaunch.err
>  does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.handleContainerExitWithFailure(ContainerLaunch.java:545)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.handleContainerExitCode(ContainerLaunch.java:511)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:319)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:93)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){noformat}
> containerLogDir is picked on container launch via 
> {{LocalDirAllocator#getLocalPathForWrite}}, which is where it looks for 
> {{prelaunch.err}} when the container fails. But prelaunch.err (and 
> prelaunch.out) are created in the first log dir (in {{ContainerLaunch#call}}: 
> {noformat}exec.writeLaunchEnv(containerScriptOutStream, environment,
> localResources, launchContext.getCommands(),
> new Path(containerLogDirs.get(0)), user);{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7737) prelaunch.err file not found exception on container failure

2018-01-12 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reassigned YARN-7737:
---

Assignee: Keqiu Hu  (was: Jonathan Hung)

> prelaunch.err file not found exception on container failure
> ---
>
> Key: YARN-7737
> URL: https://issues.apache.org/jira/browse/YARN-7737
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Keqiu Hu
>
> Hit this exception when a container failed:{noformat}2018-01-11 19:04:08,036 
> ERROR 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Failed to get tail of the container's prelaunch error log file
> java.io.FileNotFoundException: File 
> /grid/b/tmp/userlogs/application_1515190594800_1766/container_e39_1515190594800_1766_01_02/prelaunch.err
>  does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:641)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:930)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:631)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.handleContainerExitWithFailure(ContainerLaunch.java:545)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.handleContainerExitCode(ContainerLaunch.java:511)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:319)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:93)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){noformat}
> containerLogDir is picked on container launch via 
> {{LocalDirAllocator#getLocalPathForWrite}}, which is where it looks for 
> {{prelaunch.err}} when the container fails. But prelaunch.err (and 
> prelaunch.out) are created in the first log dir (in {{ContainerLaunch#call}}: 
> {noformat}exec.writeLaunchEnv(containerScriptOutStream, environment,
> localResources, launchContext.getCommands(),
> new Path(containerLogDirs.get(0)), user);{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6067) Applications API Service HA

2017-11-15 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved YARN-6067.
-
Resolution: Duplicate

> Applications API Service HA
> ---
>
> Key: YARN-6067
> URL: https://issues.apache.org/jira/browse/YARN-6067
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>
> We need to start thinking about HA for the Applications API Service. How do 
> we achieve it? Should API Service become part of the RM process to get a lot 
> of things for free? Should there be some other strategy. We need to start the 
> discussion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-6067) Applications API Service HA

2017-11-15 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened YARN-6067:
-

> Applications API Service HA
> ---
>
> Key: YARN-6067
> URL: https://issues.apache.org/jira/browse/YARN-6067
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Gour Saha
>
> We need to start thinking about HA for the Applications API Service. How do 
> we achieve it? Should API Service become part of the RM process to get a lot 
> of things for free? Should there be some other strategy. We need to start the 
> discussion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7225) Add queue and partition info to RM audit log

2017-09-19 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated YARN-7225:

Component/s: resourcemanager

> Add queue and partition info to RM audit log
> 
>
> Key: YARN-7225
> URL: https://issues.apache.org/jira/browse/YARN-7225
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Jonathan Hung
>
> Right now RM audit log has fields such as user, ip, resource, etc. Having 
> queue and partition  is useful for resource tracking.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6840) Leverage RMStateStore to store scheduler configuration updates

2017-08-02 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110380#comment-16110380
 ] 

Zhe Zhang commented on YARN-6840:
-

Could this task leverage logic from HDFS-10631 and YARN-6900?

> Leverage RMStateStore to store scheduler configuration updates
> --
>
> Key: YARN-6840
> URL: https://issues.apache.org/jira/browse/YARN-6840
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>
> With this change, user doesn't have to setup separate storage system (like 
> LevelDB) to store updates of scheduler configs. And dynamic queue can be used 
> when RM HA is enabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6719) Fix findbugs warnings in SLSCapacityScheduler.java

2017-06-22 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059817#comment-16059817
 ] 

Zhe Zhang commented on YARN-6719:
-

Removed release-blocker label since it's already committed to branch-2.7

> Fix findbugs warnings in SLSCapacityScheduler.java
> --
>
> Key: YARN-6719
> URL: https://issues.apache.org/jira/browse/YARN-6719
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
> Fix For: 2.9.0, 2.7.4, 2.8.2
>
> Attachments: YARN-6719-branch-2.01.patch, 
> YARN-6719-branch-2.8-01.patch
>
>
> There are 2 findbugs warnings in branch-2. 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/12560/artifact/patchprocess/branch-findbugs-hadoop-tools_hadoop-sls-warnings.html
> {noformat}
> DmFound reliance on default encoding in 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics(): new 
> java.io.FileWriter(String)
> Bug type DM_DEFAULT_ENCODING (click for details) 
> In class org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler
> In method 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics()
> Called method new java.io.FileWriter(String)
> At SLSCapacityScheduler.java:[line 464]
> DmFound reliance on default encoding in new 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable(SLSCapacityScheduler):
>  new java.io.FileWriter(String)
> Bug type DM_DEFAULT_ENCODING (click for details) 
> In class 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable
> In method new 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable(SLSCapacityScheduler)
> Called method new java.io.FileWriter(String)
> At SLSCapacityScheduler.java:[line 669]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6719) Fix findbugs warnings in SLSCapacityScheduler.java

2017-06-22 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated YARN-6719:

Labels:   (was: release-blocker)

> Fix findbugs warnings in SLSCapacityScheduler.java
> --
>
> Key: YARN-6719
> URL: https://issues.apache.org/jira/browse/YARN-6719
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
> Fix For: 2.9.0, 2.7.4, 2.8.2
>
> Attachments: YARN-6719-branch-2.01.patch, 
> YARN-6719-branch-2.8-01.patch
>
>
> There are 2 findbugs warnings in branch-2. 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/12560/artifact/patchprocess/branch-findbugs-hadoop-tools_hadoop-sls-warnings.html
> {noformat}
> DmFound reliance on default encoding in 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics(): new 
> java.io.FileWriter(String)
> Bug type DM_DEFAULT_ENCODING (click for details) 
> In class org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler
> In method 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics()
> Called method new java.io.FileWriter(String)
> At SLSCapacityScheduler.java:[line 464]
> DmFound reliance on default encoding in new 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable(SLSCapacityScheduler):
>  new java.io.FileWriter(String)
> Bug type DM_DEFAULT_ENCODING (click for details) 
> In class 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable
> In method new 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable(SLSCapacityScheduler)
> Called method new java.io.FileWriter(String)
> At SLSCapacityScheduler.java:[line 669]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path

2017-02-02 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated YARN-3269:

Fix Version/s: 2.7.4

> Yarn.nodemanager.remote-app-log-dir could not be configured to fully 
> qualified path
> ---
>
> Key: YARN-3269
> URL: https://issues.apache.org/jira/browse/YARN-3269
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1
>
> Attachments: YARN-3269.1.patch, YARN-3269.2.patch
>
>
> Log aggregation currently is always relative to the default file system, not 
> an arbitrary file system identified by URI. So we can't put an arbitrary 
> fully-qualified URI into yarn.nodemanager.remote-app-log-dir.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path

2017-02-02 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850240#comment-15850240
 ] 

Zhe Zhang commented on YARN-3269:
-

Thanks [~xgong] for the fix. I just cherry-picked to branch-2.7.

> Yarn.nodemanager.remote-app-log-dir could not be configured to fully 
> qualified path
> ---
>
> Key: YARN-3269
> URL: https://issues.apache.org/jira/browse/YARN-3269
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: YARN-3269.1.patch, YARN-3269.2.patch
>
>
> Log aggregation currently is always relative to the default file system, not 
> an arbitrary file system identified by URI. So we can't put an arbitrary 
> fully-qualified URI into yarn.nodemanager.remote-app-log-dir.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6072) RM unable to start in secure mode

2017-01-13 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822741#comment-15822741
 ] 

Zhe Zhang commented on YARN-6072:
-

Thanks for the fix [~ajithshetty] [~bibinchundatt] [~Naganarasimha]. Quick 
questions: 1) is this issue valid only when RM HA is used? 2) does branch-2.7 
have the issue?

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Fix For: 2.8.0, 2.9.0, 3.0.0-alpha2
>
> Attachments: hadoop-secureuser-resourcemanager-vm1.log, 
> YARN-6072.01.branch-2.8.patch, YARN-6072.01.branch-2.patch, 
> YARN-6072.01.patch, YARN-6072.02.patch, YARN-6072.03.branch-2.8.patch, 
> YARN-6072.03.patch
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> 

[jira] [Updated] (YARN-6072) RM unable to start in secure mode

2017-01-13 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated YARN-6072:

Component/s: resourcemanager

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Fix For: 2.8.0, 2.9.0, 3.0.0-alpha2
>
> Attachments: hadoop-secureuser-resourcemanager-vm1.log, 
> YARN-6072.01.branch-2.8.patch, YARN-6072.01.branch-2.patch, 
> YARN-6072.01.patch, YARN-6072.02.patch, YARN-6072.03.branch-2.8.patch, 
> YARN-6072.03.patch
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminService
> During resource manager service 

[jira] [Created] (YARN-5854) Throw more accurate exceptions from LinuxContainerExecutor#init

2016-11-07 Thread Zhe Zhang (JIRA)
Zhe Zhang created YARN-5854:
---

 Summary: Throw more accurate exceptions from 
LinuxContainerExecutor#init
 Key: YARN-5854
 URL: https://issues.apache.org/jira/browse/YARN-5854
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, yarn
Reporter: Zhe Zhang
Assignee: Jonathan Hung
Priority: Minor


YARN-5822 logs {{ContainerExecutionException}}, but doesn't include exception 
{{e}} in the IOException it throws.

Another improvement is to reduce the duplicate exception messages:
# "Failed to bootstrap configured resource subsystems!"
# "Failed to initialize linux container runtime(s)!"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-4998) Minor cleanup to UGI use in AdminService

2016-11-04 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang resolved YARN-4998.
-
Resolution: Fixed

> Minor cleanup to UGI use in AdminService
> 
>
> Key: YARN-4998
> URL: https://issues.apache.org/jira/browse/YARN-4998
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Trivial
> Attachments: YARN-4998.001.patch, YARN-4998.002.patch
>
>
> Instead of calling {{UserGroupInformation.getCurrentUser()}} over and over, 
> we should just use the stored {{daemonUser}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-10-25 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606834#comment-15606834
 ] 

Zhe Zhang commented on YARN-5734:
-

Since there is some overlap between this JIRA's objectives and those of 
YARN-5724, we plan to have a meetup to better discuss these 2 projects. Thanks 
[~wangda] and [~xgong] for proposing this. Please join in-person or remotely if 
you are interested.

*When*: Wednesday 10/26 2~4pm
*Where*: LinkedIn HQ, 950 West Maude Avenue, Sunnyvale, CA. (If you do plan to 
attend in-person, please email z...@apache.org)
*Confcall*: https://bluejeans.com/654904000 

We will post notes after the meetup.

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5734) OrgQueue for easy CapacityScheduler queue configuration management

2016-10-18 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586290#comment-15586290
 ] 

Zhe Zhang commented on YARN-5734:
-

Thanks [~mshen] [~zhouyejoe] [~jhung] for the proposal! Also thanks [~curino] 
for the very helpful feedback.

This is potentially a pretty large change, and I think we should use a feature 
branch for the development. Please share your opinions on this, thanks.

> OrgQueue for easy CapacityScheduler queue configuration management
> --
>
> Key: YARN-5734
> URL: https://issues.apache.org/jira/browse/YARN-5734
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Min Shen
>Assignee: Min Shen
> Attachments: OrgQueue_Design_v0.pdf
>
>
> The current xml based configuration mechanism in CapacityScheduler makes it 
> very inconvenient to apply any changes to the queue configurations. We saw 2 
> main drawbacks in the file based configuration mechanism:
> # This makes it very inconvenient to automate queue configuration updates. 
> For example, in our cluster setup, we leverage the queue mapping feature from 
> YARN-2411 to route users to their dedicated organization queues. It could be 
> extremely cumbersome to keep updating the config file to manage the very 
> dynamic mapping between users to organizations.
> # Even a user has the admin permission on one specific queue, that user is 
> unable to make any queue configuration changes to resize the subqueues, 
> changing queue ACLs, or creating new queues. All these operations need to be 
> performed in a centralized manner by the cluster administrators.
> With these current limitations, we realized the need of a more flexible 
> configuration mechanism that allows queue configurations to be stored and 
> managed more dynamically. We developed the feature internally at LinkedIn 
> which introduces the concept of MutableConfigurationProvider. What it 
> essentially does is to provide a set of configuration mutation APIs that 
> allows queue configurations to be updated externally with a set of REST APIs. 
> When performing the queue configuration changes, the queue ACLs will be 
> honored, which means only queue administrators can make configuration changes 
> to a given queue. MutableConfigurationProvider is implemented as a pluggable 
> interface, and we have one implementation of this interface which is based on 
> Derby embedded database.
> This feature has been deployed at LinkedIn's Hadoop cluster for a year now, 
> and have gone through several iterations of gathering feedbacks from users 
> and improving accordingly. With this feature, cluster administrators are able 
> to automate lots of thequeue configuration management tasks, such as setting 
> the queue capacities to adjust cluster resources between queues based on 
> established resource consumption patterns, or managing updating the user to 
> queue mappings. We have attached our design documentation with this ticket 
> and would like to receive feedbacks from the community regarding how to best 
> integrate it with the latest version of YARN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3877) YarnClientImpl.submitApplication swallows exceptions

2016-10-11 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566951#comment-15566951
 ] 

Zhe Zhang commented on YARN-3877:
-

Thanks for the work [~varun_saxena]. I just backported to branch-2.7.

> YarnClientImpl.submitApplication swallows exceptions
> 
>
> Key: YARN-3877
> URL: https://issues.apache.org/jira/browse/YARN-3877
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Varun Saxena
>Priority: Minor
> Fix For: 2.8.0, 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-3877.01.patch, YARN-3877.02.patch, 
> YARN-3877.03.patch, YARN-3877.04.patch
>
>
> When {{YarnClientImpl.submitApplication}} spins waiting for the application 
> to be accepted, any interruption during its Sleep() calls are logged and 
> swallowed.
> this makes it hard to interrupt the thread during shutdown. Really it should 
> throw some form of exception and let the caller deal with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3877) YarnClientImpl.submitApplication swallows exceptions

2016-10-04 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546975#comment-15546975
 ] 

Zhe Zhang commented on YARN-3877:
-

Hi Vinod, I'm considering this patch for branch-2.7. Any reason it was moved 
out of 2.7.2? Compatibility concern? Thanks.

> YarnClientImpl.submitApplication swallows exceptions
> 
>
> Key: YARN-3877
> URL: https://issues.apache.org/jira/browse/YARN-3877
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.7.2
>Reporter: Steve Loughran
>Assignee: Varun Saxena
>Priority: Minor
> Fix For: 2.8.0, 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-3877.01.patch, YARN-3877.02.patch, 
> YARN-3877.03.patch, YARN-3877.04.patch
>
>
> When {{YarnClientImpl.submitApplication}} spins waiting for the application 
> to be accepted, any interruption during its Sleep() calls are logged and 
> swallowed.
> this makes it hard to interrupt the thread during shutdown. Really it should 
> throw some form of exception and let the caller deal with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5550) TestYarnCLI#testGetContainers should format according to CONTAINER_PATTERN

2016-08-29 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated YARN-5550:

Fix Version/s: 3.0.0-alpha2
   2.8.0

> TestYarnCLI#testGetContainers should format according to CONTAINER_PATTERN
> --
>
> Key: YARN-5550
> URL: https://issues.apache.org/jira/browse/YARN-5550
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client, test
>Affects Versions: 2.6.4
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Minor
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha2
>
> Attachments: YARN-5550.001.patch, YARN-5550.002.patch, 
> YARN-5550.003.patch
>
>
> TestYarnCLI#testGetContainers hard codes expected output of getting list of 
> containers via Yarn CLI. If the timestamp is shorter than the number of 
> expected characters in ApplicationCLI#CONTAINER_PATTERN (which is 20), the 
> assert will fail due to whitespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5550) TestYarnCLI#testGetContainers should format according to CONTAINER_PATTERN

2016-08-29 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated YARN-5550:

 Hadoop Flags: Reviewed
Fix Version/s: 2.7.4
  Component/s: test
   client

Thanks Jonathan! I just committed the patch to trunk~branch-2.7.

> TestYarnCLI#testGetContainers should format according to CONTAINER_PATTERN
> --
>
> Key: YARN-5550
> URL: https://issues.apache.org/jira/browse/YARN-5550
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client, test
>Affects Versions: 2.6.4
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Minor
> Fix For: 2.7.4
>
> Attachments: YARN-5550.001.patch, YARN-5550.002.patch, 
> YARN-5550.003.patch
>
>
> TestYarnCLI#testGetContainers hard codes expected output of getting list of 
> containers via Yarn CLI. If the timestamp is shorter than the number of 
> expected characters in ApplicationCLI#CONTAINER_PATTERN (which is 20), the 
> assert will fail due to whitespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5550) TestYarnCLI#testGetContainers should format according to CONTAINER_PATTERN

2016-08-29 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446717#comment-15446717
 ] 

Zhe Zhang commented on YARN-5550:
-

Thanks [~jhung] for taking on the work. Patch LGTM overall. A couple of very 
minor nits:
# Can we move {{CONTAINER_PATTERN}} below the private variables?
# We can also consider using the {{VisibleForTesting}} annotation here

+1 pending above

> TestYarnCLI#testGetContainers should format according to CONTAINER_PATTERN
> --
>
> Key: YARN-5550
> URL: https://issues.apache.org/jira/browse/YARN-5550
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.4
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Minor
> Attachments: YARN-5550.001.patch, YARN-5550.002.patch
>
>
> TestYarnCLI#testGetContainers hard codes expected output of getting list of 
> containers via Yarn CLI. If the timestamp is shorter than the number of 
> expected characters in ApplicationCLI#CONTAINER_PATTERN (which is 20), the 
> assert will fail due to whitespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5550) TestYarnCLI#testGetContainers should format according to CONTAINER_PATTERN

2016-08-25 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated YARN-5550:

Assignee: Jonathan Hung

> TestYarnCLI#testGetContainers should format according to CONTAINER_PATTERN
> --
>
> Key: YARN-5550
> URL: https://issues.apache.org/jira/browse/YARN-5550
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.6.4
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Minor
> Attachments: YARN-5550.001.patch, YARN-5550.002.patch
>
>
> TestYarnCLI#testGetContainers hard codes expected output of getting list of 
> containers via Yarn CLI. If the timestamp is shorter than the number of 
> expected characters in ApplicationCLI#CONTAINER_PATTERN (which is 20), the 
> assert will fail due to whitespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY

2016-04-12 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238268#comment-15238268
 ] 

Zhe Zhang commented on YARN-2694:
-

Thanks a lot Wangda for the clear explanation! I think YARN-4140 is what we 
need.

> Ensure only single node labels specified in resource request / host, and node 
> label expression only specified when resourceName=ANY
> ---
>
> Key: YARN-2694
> URL: https://issues.apache.org/jira/browse/YARN-2694
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch, 
> YARN-2694-20141023-1.patch, YARN-2694-20141023-2.patch, 
> YARN-2694-20141101-1.patch, YARN-2694-20141101-2.patch, 
> YARN-2694-20150121-1.patch, YARN-2694-20150122-1.patch, 
> YARN-2694-20150202-1.patch, YARN-2694-20150203-1.patch, 
> YARN-2694-20150203-2.patch, YARN-2694-20150204-1.patch, 
> YARN-2694-20150205-1.patch, YARN-2694-20150205-2.patch, 
> YARN-2694-20150205-3.patch, YARN-2694-branch-2.6.1.txt
>
>
> Currently, node label expression supporting in capacity scheduler is partial 
> completed. Now node label expression specified in Resource Request will only 
> respected when it specified at ANY level. And a ResourceRequest/host with 
> multiple node labels will make user limit, etc. computation becomes more 
> tricky.
> Now we need temporarily disable them, changes include,
> - AMRMClient
> - ApplicationMasterService
> - RMAdminCLI
> - CommonNodeLabelsManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2694) Ensure only single node labels specified in resource request / host, and node label expression only specified when resourceName=ANY

2016-04-06 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229313#comment-15229313
 ] 

Zhe Zhang commented on YARN-2694:
-

Hi [~jianhe], [~leftnoteasy], I have a few questions about this change:
bq. Currently, node label expression supporting in capacity scheduler is 
partial completed. Now node label expression specified in Resource Request will 
only respected when it specified at ANY level.
Could you elaborate a bit on this? Is it because it's hard to satisfy both node 
label and locality requirements? If so, when do you think we will be ready to 
enable Node or Rack level resource requests?

Thanks,


> Ensure only single node labels specified in resource request / host, and node 
> label expression only specified when resourceName=ANY
> ---
>
> Key: YARN-2694
> URL: https://issues.apache.org/jira/browse/YARN-2694
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-2694-20141020-1.patch, YARN-2694-20141021-1.patch, 
> YARN-2694-20141023-1.patch, YARN-2694-20141023-2.patch, 
> YARN-2694-20141101-1.patch, YARN-2694-20141101-2.patch, 
> YARN-2694-20150121-1.patch, YARN-2694-20150122-1.patch, 
> YARN-2694-20150202-1.patch, YARN-2694-20150203-1.patch, 
> YARN-2694-20150203-2.patch, YARN-2694-20150204-1.patch, 
> YARN-2694-20150205-1.patch, YARN-2694-20150205-2.patch, 
> YARN-2694-20150205-3.patch, YARN-2694-branch-2.6.1.txt
>
>
> Currently, node label expression supporting in capacity scheduler is partial 
> completed. Now node label expression specified in Resource Request will only 
> respected when it specified at ANY level. And a ResourceRequest/host with 
> multiple node labels will make user limit, etc. computation becomes more 
> tricky.
> Now we need temporarily disable them, changes include,
> - AMRMClient
> - ApplicationMasterService
> - RMAdminCLI
> - CommonNodeLabelsManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4857) Add missing default configuration regarding preemption of CapacityScheduler

2016-04-01 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated YARN-4857:

Fix Version/s: (was: 2.9.0)

> Add missing default configuration regarding preemption of CapacityScheduler
> ---
>
> Key: YARN-4857
> URL: https://issues.apache.org/jira/browse/YARN-4857
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, documentation
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Minor
>  Labels: documentaion
> Attachments: YARN-4857.01.patch, YARN-4857.02.patch
>
>
> {{yarn.resourcemanager.monitor.*}} configurations are missing in 
> yarn-default.xml. Since they were documented explicitly by YARN-4492, 
> yarn-default.xml can be modified as same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)