GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/3765
SPARK-1714. Take advantage of AMRMClient APIs to simplify logic in YarnA...
...llocator
The goal of this PR is to simplify YarnAllocator as much as possible and
get it up to the level of code quality we see in the rest of Spark.
In service of this, it does a few things:
* Uses AMRMClient APIs for matching containers to requests.
* Adds calls to AMRMClient.removeContainerRequest so that, when we use a
container, we don't end up requesting it again.
* Removes YarnAllocator's host->rack cache. YARN's RackResolver already
does this caching, so this is redundant.
* Adds tests for basic YarnAllocator functionality.
* Breaks up the allocateResources method, which was previously nearly 300
lines.
* A little bit of stylistic cleanup.
* Fixes a bug that causes three times the requests to be filed when
preferred host locations are given.
The patch is lossy. In particular, it loses the logic for trying to avoid
containers bunching up on nodes. As I understand it, the logic that's gone is:
* If, in a single response from the RM, we receive a set of containers on a
node, and prefer some number of containers on that node greater than 0 but less
than the number we received, give back the delta between what we preferred and
what we received.
This seems like a weird way to avoid bunching E.g. it does nothing to avoid
bunching when we don't request containers on particular nodes.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sryza/spark sandy-spark-1714
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/3765.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3765
----
commit 1becc3794b12000b2ad32c8cf4593652543641c6
Author: Sandy Ryza <[email protected]>
Date: 2014-12-22T05:34:39Z
SPARK-1714. Take advantage of AMRMClient APIs to simplify logic in
YarnAllocator
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]