[
https://issues.apache.org/jira/browse/YARN-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16640147#comment-16640147
]
Wangda Tan commented on YARN-8849:
----------------------------------
Thanks [~asuresh] and [[email protected]] filing the proposal. This looks
very interesting.
But there are some parts I have some doubt and hope to get more clarifications:
1) If no container actually launched, how can we test changes like cgroups OOM
killer, utilization metrics collection from OS, etc.
2) If one of the purpose is to handle the simulated NM / AM running with RM
inside the same process, we can run simulated AM/NM in a separate process /
machine to better isolate resource usage.
3) Can we do large scale simulation by using relatively less number of
machines? (Like using 10 nodes to simulate 10k nodes).
> DynoYARN: A simulation and testing infrastructure for YARN clusters
> -------------------------------------------------------------------
>
> Key: YARN-8849
> URL: https://issues.apache.org/jira/browse/YARN-8849
> Project: Hadoop YARN
> Issue Type: New Feature
> Reporter: Arun Suresh
> Assignee: Keqiu Hu
> Priority: Major
>
> Traditionally, YARN workload simulation is performed using SLS (Scheduler
> Load Simulator) which is packaged with YARN. It Essentially, starts a full
> fledged *ResourceManager*, but runs simulators for the *NodeManager* and the
> *ApplicationMaster* Containers. These simulators are lightweight and run in a
> threadpool. The NM simulators do not open any external ports and send
> (in-process) heartbeats to the ResourceManager.
> There are a couple of drawbacks with using the SLS:
> * It might be difficult to simulate really large clusters without having
> access to a very beefy box - since the NMs are launched as tasks in a
> threadpool, and each NM has to send periodic heartbeats to the RM.
> * Certain features (like YARN-1011) requires changes to the NodeManager -
> aspects such as queuing and selectively killing containers have to be
> incorporated into the existing NM Simulator which might make the simulator a
> bit heavy weight - there is a need for locking and synchronization.
> * Since the NM and AM are simulations, only the Scheduler is faithfully
> tested - it does not really perform an end-2-end test of a cluster.
> Therefore, drawing inspiration from
> [Dynamometer|https://github.com/linkedin/dynamometer], we propose a framework
> for YARN deployable YARN cluster - *DynoYARN* - for testing, with the
> following features:
> * The NM already has hooks to plug-in custom *ContainerExecutor* and
> *NodeResourceMonitor*. If we can plug-in a custom *ContainersMonitorImpl*'s
> Monitoring thread (and other modules like the LocalizationService), We can
> probably inject an Executor that does not actually launch containers and a
> Node and Container resource monitor that reports synthetic pre-specified
> Utilization metrics back to the RM.
> * Since we are launching fake containers, we cannot run normal AM
> containers. We can therefore, use *Unmanaged AM*'s to launch synthetic jobs.
> Essentially, a test workflow would look like this:
> * Launch a DynoYARN cluster.
> * Use the Unmanaged AM feature to directly negotiate with the DynaYARN
> Resource Manager for container tokens.
> * Use the container tokens from the RM to directly ask the DynoYARN Node
> Managers to start fake containers.
> * The DynoYARN NodeManagers will start the fake containers and report to the
> DynoYARN Resource Manager synthetically generated resource utilization for
> the containers (which will be injected via the *ContainerLaunchContext* and
> parsed by the plugged-in Container Executor).
> * The Scheduler will use the utilization report to schedule containers - we
> will be able to test allocation of *Opportunistic* containers based on
> resource utilization.
> * Since the DynoYARN Node Managers run the actual code paths, all preemption
> and queuing logic will be faithfully executed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]