Do you have a repro for this? Bikas
-----Original Message----- From: Sandy Ryza [mailto:[email protected]] Sent: Monday, January 07, 2013 5:40 PM To: [email protected] Subject: multiple requests for I've come across an NPE in AppSchedulingInfo so I looked around to try to determine the cause, and I think came across a problem with how containers are scheduled. It seems like somebody should have run into this already, so I wanted to ask about it before I filed a JIRA. Am I just misunderstanding how things work? When requesting a node-local container, YARN schedulers expect three ResourceRequests - one at the node-level, one at the rack level, and one at the "*" level. For each application and priority, these requests are stored by the RM as a map of location strings to ResourceRequests. Schedulers try to schedule requests node-locally, but do rack-local, and then off-switch, after a given number of heartbeats pass. When a node-local container is allocated, the number of outstanding containers is decremented at each level. When a rack-local container is allocated, only the number of outstanding rack local and "*" requests are decremented. This means that if a rack-local container is allocated, the node-local container will still be around, and when the scheduler tries to allocate it, the scheduler should run into an NPE, as there will be no rack-local ResourceRequest to decrement. What would be the best way to deal with this? It seems like node-local ResourceRequests need to be tied to rack-local ResourceRequests, so that node-local requests can be removed when their corresponding rack-local requests are, but the current AllocateRequest is a list of independent resource requests. thanks for any guidance, Sandy
