For a work-around, if you turn on DRF / multi-resource scheduling, you could use vcore capacities to limit the number of containers per node?
On Fri, Mar 21, 2014 at 11:35 PM, Chris Riccomini <[email protected]>wrote: > Hey Guys, > > @Vinod: We aren't overriding the default, so we must be using -1 as the > setting. > > @Sandy: We aren't specifying any racks/hosts when sending the resource > requests. +1 regarding introducing a similar limit in capacity scheduler. > > Any recommended work-arounds in the mean time? Our utilization of the grid > is very low because we're having to force high memory requests for the > containers in order to guarantee a maximum number of containers on a > single node (e.g. Set container memory MB set to 17GB to disallow more > than 2 containers from being assigned to any one 48GB node). > > Cheers, > Chris > > On 3/21/14 11:30 PM, "Sandy Ryza" <[email protected]> wrote: > > >yarn.scheduler.capacity.node-locality-delay will help if the app is > >requesting containers at particular locations, but won't help spread > >things > >out evenly otherwise. > > > >The Fair Scheduler attempts an even spread. By default, it only schedules > >a single container each time it considers a node. Decoupling scheduling > >from node heartbeats (YARN-1010) makes it so that a high node heartbeat > >interval doesn't result in this being slow. Now that the Capacity > >Scheduler has similar capabilities (YARN-1512), it might make sense to > >introduce a similar limit? > > > >-Sandy > > > > > >On Fri, Mar 21, 2014 at 4:42 PM, Vinod Kumar Vavilapalli > ><[email protected] > >> wrote: > > > >> What's the value for yarn.scheduler.capacity.node-locality-delay? It is > >>-1 > >> by default in 2.2. > >> > >> We fixed the default to be a reasonable 40 (nodes in a rack) in 2.3.0 > >>that > >> should spread containers a bit. > >> > >> Thanks, > >> +Vinod > >> > >> On Mar 21, 2014, at 12:48 PM, Chris Riccomini <[email protected]> > >> wrote: > >> > >> > Hey Guys, > >> > > >> > We're running YARN 2.2 with the capacity scheduler. Each NM is running > >> with 40G of memory capacity. When we request a series containers with > >>2G of > >> memory from a single AM, we see the RM assigning them entirely to one NM > >> until that NM is full, and then moving on to the next, and so on. > >> Essentially, we have a grid with 20 nodes, and two are completely full, > >>and > >> the rest are completely empty. This is problematic because our > >>containers > >> use disk heavily, and are completely saturating the disks on the two > >>nodes, > >> which slows all of the containers down on these NMs. > >> > > >> > 1. Is this expected behavior of the capacity scheduler? What about > >>the > >> fifo scheduler? > >> > 2. Is the recommended work around just to increase memory allocation > >> per-container as a proxy for the disk capacity that's required? Given > >>that > >> there's no disk-level isolation, and no disk-level resource, I don't see > >> another way around this. > >> > > >> > Cheers, > >> > Chris > >> > >> > >> -- > >> CONFIDENTIALITY NOTICE > >> NOTICE: This message is intended for the use of the individual or > >>entity to > >> which it is addressed and may contain information that is confidential, > >> privileged and exempt from disclosure under applicable law. If the > >>reader > >> of this message is not the intended recipient, you are hereby notified > >>that > >> any printing, copying, dissemination, distribution, disclosure or > >> forwarding of this communication is strictly prohibited. If you have > >> received this communication in error, please contact the sender > >>immediately > >> and delete it from your system. Thank You. > >> > >
