I was able to reproduce it with the attached test. It also comes up consistently when running a test suite. So far we've only tested with the fair scheduler, so it's possible it's specific to that, but what I described below should apply to all schedulers.
On Mon, Jan 7, 2013 at 6:05 PM, Bikas Saha <[email protected]> wrote: > Do you have a repro for this? > > Bikas > > -----Original Message----- > From: Sandy Ryza [mailto:[email protected]] > Sent: Monday, January 07, 2013 5:40 PM > To: [email protected] > Subject: multiple requests for > > I've come across an NPE in AppSchedulingInfo so I looked around to try to > determine the cause, and I think came across a problem with how containers > are scheduled. It seems like somebody should have run into this already, > so I wanted to ask about it before I filed a JIRA. Am I just > misunderstanding how things work? > > When requesting a node-local container, YARN schedulers expect three > ResourceRequests - one at the node-level, one at the rack level, and one > at the "*" level. For each application and priority, these requests are > stored by the RM as a map of location strings to ResourceRequests. > Schedulers try to schedule requests node-locally, but do rack-local, and > then off-switch, after a given number of heartbeats pass. When a > node-local container is allocated, the number of outstanding containers is > decremented at each level. When a rack-local container is allocated, only > the number of outstanding rack local and "*" requests are decremented. > > This means that if a rack-local container is allocated, the node-local > container will still be around, and when the scheduler tries to allocate > it, the scheduler should run into an NPE, as there will be no rack-local > ResourceRequest to decrement. > What would be the best way to deal with this? It seems like node-local > ResourceRequests need to be tied to rack-local ResourceRequests, so that > node-local requests can be removed when their corresponding rack-local > requests are, but the current AllocateRequest is a list of independent > resource requests. > > thanks for any guidance, > Sandy >
