Github user aarondav commented on the pull request:
https://github.com/apache/spark/pull/3001#issuecomment-61360218
@rxin Please take a look at my last commit, fd3928b. This is some critical
code which handles a couple cases I describe below.
I have completed performance and correctness testing on a real cluster.
Performance-wise, I saw no regression from the in-executor version.
Additionally, I saw minimal memory usage from the Worker, where I put the
server -- I ran several medium-sized shuffles using Workers with 512MB max heap
sizes (the default) without noticeable garbage collection or heap growth.
During testing, I noticed that we were dropping map outputs if I killed an
executor in the middle of a map or reduce phase (in local testing, my queries
ran so quickly that I always killed the executor after the completion of the
job). This caused us to unnecessarily recompute the map tasks. I have added
code which only drops map outputs if (1) external shuffle is disabled or (2)
we're responding to a fetch failure specifically.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]