[ https://issues.apache.org/jira/browse/YARN-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060563#comment-17060563 ]
Jonathan Hung commented on YARN-8849: ------------------------------------- Hey [~brahmareddy], we're working on the open sourcing process, we'll post here when there's an update. > DynoYARN: A simulation and testing infrastructure for YARN clusters > ------------------------------------------------------------------- > > Key: YARN-8849 > URL: https://issues.apache.org/jira/browse/YARN-8849 > Project: Hadoop YARN > Issue Type: New Feature > Reporter: Arun Suresh > Assignee: Jonathan Hung > Priority: Major > > Traditionally, YARN workload simulation is performed using SLS (Scheduler > Load Simulator) which is packaged with YARN. It Essentially, starts a full > fledged *ResourceManager*, but runs simulators for the *NodeManager* and the > *ApplicationMaster* Containers. These simulators are lightweight and run in a > threadpool. The NM simulators do not open any external ports and send > (in-process) heartbeats to the ResourceManager. > There are a couple of drawbacks with using the SLS: > * It might be difficult to simulate really large clusters without having > access to a very beefy box - since the NMs are launched as tasks in a > threadpool, and each NM has to send periodic heartbeats to the RM. > * Certain features (like YARN-1011) requires changes to the NodeManager - > aspects such as queuing and selectively killing containers have to be > incorporated into the existing NM Simulator which might make the simulator a > bit heavy weight - there is a need for locking and synchronization. > * Since the NM and AM are simulations, only the Scheduler is faithfully > tested - it does not really perform an end-2-end test of a cluster. > Therefore, drawing inspiration from > [Dynamometer|https://github.com/linkedin/dynamometer], we propose a framework > for YARN deployable YARN cluster - *DynoYARN* - for testing, with the > following features: > * The NM already has hooks to plug-in custom *ContainerExecutor* and > *NodeResourceMonitor*. If we can plug-in a custom *ContainersMonitorImpl*'s > Monitoring thread (and other modules like the LocalizationService), We can > probably inject an Executor that does not actually launch containers and a > Node and Container resource monitor that reports synthetic pre-specified > Utilization metrics back to the RM. > * Since we are launching fake containers, we cannot run normal AM > containers. We can therefore, use *Unmanaged AM*'s to launch synthetic jobs. > Essentially, a test workflow would look like this: > * Launch a DynoYARN cluster. > * Use the Unmanaged AM feature to directly negotiate with the DynaYARN > Resource Manager for container tokens. > * Use the container tokens from the RM to directly ask the DynoYARN Node > Managers to start fake containers. > * The DynoYARN NodeManagers will start the fake containers and report to the > DynoYARN Resource Manager synthetically generated resource utilization for > the containers (which will be injected via the *ContainerLaunchContext* and > parsed by the plugged-in Container Executor). > * The Scheduler will use the utilization report to schedule containers - we > will be able to test allocation of *Opportunistic* containers based on > resource utilization. > * Since the DynoYARN Node Managers run the actual code paths, all preemption > and queuing logic will be faithfully executed. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org