Github user markgrover commented on the pull request:
https://github.com/apache/spark/pull/8093#issuecomment-129709754
This pull request is meant to achieve two goals:
1. Show in driver logs, primarily in yarn client mode, if YARN is killing
containers because of one or more of the thresholds (of say, physical or
virtual memory) is being exceeded.
2. Display the above reason in the Spark UI.
Here's some more context into how the above have been achieved:
For (1) above, I considered two options - a) adding a new RPC message
ContainerRemoved from the YarnAllocator to the YarnSchedulerBackend which will
be sent when a container is killed by YARN or b) simply extending and using the
RemoveExecutor message that was being passed from YarnAllocator to
YarnSchedulerBackend already. While I did implement (a), I ended up [reverting
it](https://github.com/markgrover/spark/commit/47c20c0f794d654bc4c7f08809373274cc16b7be),
and going with (2) because of its simplicity.
For (2) above, I extended the ExecutorLostFailure case class, that gets
sent down the ListenerBus by the scheduler whenever an executor is lost. That
ends up being picked by JobProgressListener and finally shows up in the UI.
I've attached a [picture on the
JIRA](https://issues.apache.org/jira/secure/attachment/12749771/error_showing_in_UI.png)
of what the error message in the UI looks like.
**Testing**
While I have updated any unit tests that have been impacted by this, I had
to also find a determinstic way of getting YARN to kill the container. For
that, I set spark.yarn.executor.memoryOverhead to a very low number and used an
app that allocates a lot of
[ByteBuffers](http://docs.oracle.com/javase/7/docs/api/java/nio/ByteBuffer.html#allocateDirect(int))
The code for this app can be found at
https://github.com/markgrover/spark-app It's simply a Pi program much like the
default Spark Pi app but it creates a bunch of ByteBuffers while it's at it. It
can be invoked like:
spark-submit --class com.markgrover.spark.ModifiedPi --master yarn
--deploy-mode client ~/spark-app/target/my-spark-app-0.0.1-SNAPSHOT.jar 1000
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]